One correction Sauraen brought up after watching this video: "You said 320x240x4x10 = about 3 MB with 4 bytes per pixel (2 for color and 2 for Z). But that's not true, it has to both read and write the Z buffer (so the real value is 320x240x6x10 = about 4.5 MB""
Even more impressive is the credits sequence, where the entire levels are rendered all at once without culling. And MK64 is an early N64 game that's only somewhat more optimized than SM64.
@@seronymusThe N64 had a separate microcode dedicated to 2D, but the limited texture cache meant that you're hitting memory more and using more polygons to render more detailed textures. It was doable, but was a delicate balancing act.
They require really really few polygons actually. Mario kart for the N64 was mostly sub 1000 polys loaded at any point in time. It has a lot less geometry than normal SuperMario64.
It's funny how one of the keys to better performance is not having large objects in the center of your level when so many of SM64s levels are exactly that. It's usually all centered around some sort of mountain or large object.
Not large objects, large stacked textures. So not really comparable, those mountains are made of lots of smaller textures and what's behind _should_ be culled(don't quote me on that)
@@goeland4585 textures are not the main cause of lag here, its each pixel checking the zbuffer causing the lag. Each pixel having to read from a texture will at most only double the lag
I'm glad nothing has stopped you from persuing your passion like this. I've been watching your content for years, and it's always a treat to see what you've been cooking up.
Oh yeah, much of this was discovered over the course of the N64's life. Mario 64 has a ton of places where they did something exactly wrong for the N64 because that would be the right way to do it on the NES and SNES.
@@benamisai-kham5892 They certainly were. Quite a few games were developed with the DD in mind and then turned into a somewhat janky mess when they had to be downported to base-hardware.
Just when I thought we've almost reached the limit enter people like you who simply rewrite the code of the N64 itself. Love your dedication to pushing the boundaries, keep on keeping on!
So long story short, the N64 doesn't like shuffling around memory that much, but is more than happy to bang out uber complicated formulas. It is really showing off its SGI heritage with that one!
Well, if I remember correctly, it was nintento's decision to cheap out on RAM (both size and speed). Sad, with the cartridge read speed, if the RAM bus could go more vroom vroom, you would have a really good console for it's time. Imagine streaming big texture directly from the cartridge...
@@huaweiphone4357 there is at least one developer that did exactly this, and the results are as stunning as you'd expect. As for cheaping out, it's not like the Nintendo 64 was built to be a cheap kid's toy or anything...😜
@@johnrickard8512 the high res texture tech demo? Nope, he didn't. He use clever mip mapping, and (a lot) of texture swaps. He loaded small chunks of textures as fast as possible to simulate big textures. The working memory is extremely limited for the gpu, and accessing it is time consuming. As far as I know, you can't ask the gpu to go fetch a texture (big or small) directly from the cartridge, you need the cpu to fetch it, copy to ram, and then ask the gpu to use it. And you are limited by the (byte) size of the texture (so lower color depth mathematically gives you bigger textures). I can be wrong, but I don't think so.
@@gamephreak5 yes, and the PlayStation also had super slow memory and an even slower CPU, and the Sega Saturn....well if you weren't someone like Traveler's Tales you would find it to be impossible to use to its fullest.
Yep. The fastest way to render something is to skip it, no matter whether it's triangles or memory that's killing performance! Then they went and flagged the sub as dynamic. Oops.
@@danolantern6030Yep! Use the camera lock feature and swim through the connecting tunnel and you'll see the first area unload and the second area load in
@@danolantern6030 yeah, honestly my smooth brain thought, "if a polygon is covered up by other polygons, it doesnt affect performance" lol but no they have to unload the polygons
These are the insane benefits you get when one person has an indepth understanding of both game design and code. Modern video games have become so unoptimized because of how complex both aspects have become. I don't think any individual developer holistically understands a modern game such as Baldur's Gate 3 or Spiderman in the same way you understand SM64. I wonder if that's even possible.
From what I've read large game studios don't make effective use of generalists due to the management style. There are probably quite a few people who know a bit about the entire stack but if a game has 300 people working on it most of them will work on one narrow part of the project. On top of that companies like Bethesda outsource parts of the work (like with Starfield), which makes the communication problem even worse. People like that probably shine more in indie projects. For example there was a PS4 exclusive indie game which used real-time cone tracing (faster version of ray tracing) for the graphics, built from scratch and years before hardware was designed to accelerate it.
@@FlamespeedyAMVthat's not it the devs they hire are usually overworked and put to work only in specific areas of the game while not knowing everything else about it and it's not their fault as it's mandated by their manager
@@SaHaRaSquadyup, this. A lot of studios separate their workers with lacks of communication between teams. If the teams could all communicate freely and work creatively together we'd probably have so many better end products... Or they'll split teams to work on different projects at random.
10k is about the polycount of an environment on the Gamecube, for reference. Main characters in that gaming gen would usually be a couple of thousand. Nowadays, for games with the Triple A Aesthetic TM, a single important character would easily be over 10k. 10k would still be considered lowpoly by modern rendering standards in film and animation. But film and animation have always been able to utilise higher polycounts than gaming, as their 3D doesn't have to render in real time.
On a game where you can create your own content like VRChat, 60-70k polys is generally the standard, though often custom content goes way above and beyond that.
If only modern games made processing efficiency more of a priority Imagine the leaps in capabilities new hardware could have when optimized to the level you're getting the N64 to... absolute legend Edit: "Try a bit more" doesn't mean "it must be the only thing devs do before anything else happens with a project" but I really shouldn't be surprised that people are trying to twist my words for the sake of forcing discussion
My university prof always tells us "Don't worry about efficient coding too much, computers are fast enough these days" For me this hurts to hear, while it's true for many applications it leads to lazy programming... using inefficient algorithms and data structures because they are easier to implement... And then you have shitty software as a result :(
id software's idtech 6 and 7 engines are a good modern example of making the most of modern computers doom eternal's vulkan renderer is pure art with how hyperoptimised and efficient it is while pushing beautiful visuals
I wonder if we'll see people doing the same thing to the Switch in 25 years. I'm sure that system is capable of a lot more than the sword and shield graphics let on... lol
The part about not having a big thing in the center of the level to improve performance is so interesting because that's how most levels in early 3D platformers like Super Mario 64 and Banjo-Kazooie were designed, having a big thing in the center gets the player's attention and gives them a sense of direction which helps since you can't just go to the right like in 2D.
I noticed the crate your standing on at 6:52 is having its vertices warp and wobble in a similar way the PS1 used to do. and even the side of the crate's edges seem to tear at 6:56 Is this a result of the Z-Buffer optimizations or something to do with the Sine Cosine optimizations. or was it older footage? I am a bit curious but i am also not very much a programmer at all.
It might be due to the crate being a dynamic object or similar. In many games (software Quake by example), models aren't rendered with perspective-correct texturing with a few exceptions (ammo pickups).
@@Calinouuh, check the bounding box size and turn on correction. It is like two instructions in RISCV and uh x86 code is a bit more inefficient, but not much.
Could be a glitch. The Z buffer makes fuck all difference in that sense. Anyways, apparently there's a bug in the SGI silicon that causes texture glitches in some situations.
You are single handedly moving Super Mario 64 and it's modding forward. Without you who knows when a person would even think to take up this work. Literally making history. I don't care how long I have to wait for all of your work and documentation to be done. All of your optimizations are well worth all the wait. An actual revolution of content will be made from your efforts. Nintendo should literally be paying you at this point. Thanks for your work Kaze, I hope you're as proud as we all are.
It most likely is moving N64 retrogaming forward as a whole because some of these optimization-approaches should be universal in at least a broad sense.
@@mirabilis decomp; decompilation. They are reverse engineering code. This is a code rewrite to give it a better more efficient engine. There are plenty of modders who've done great work, and more great work will be done, especially when this is finished. I don't see why you feel it's necessary to deminish Kazes work. Even in this video he talked about getting more outside work. But Kaze is the forefront for this project. That doesn't mean that others don't deserve recognition, but the "meh" is blatantly trying to deminish the work he's done for no reason, as if Kaze doesn't deserve recognition. If you want to do that, maybe don't play any of the mods that will be built off the back of this project. I understand there are others doing great work. I just want to show my appreciation for Kaze. Maybe you think "single handedly" is too strong of a statement, and that's fine, fair even, but I don't think I need to remind you of the potential this project will bring to other modders. If you feel I was trying to deminish others works, I apologize, I wasn't.
People ignore everybody was learning 3d at the time, artists included. Making all kinda shapes on 3d software is easy now, but can you imagine what kind of primitive tools they had to work back them? Even if you had the mastery of the software, they had to translate 2d pixel art into 3d art without the luxury of just doing a realistic model. Once you have a working 3d model is easy to criticize but those guys were the pioneers, doing stuff without a reference.
quick question actually; are some objects being deloaded before they actually leave the camera? I saw a few moving platforms get deloaded on screen instead of off screen.
Crazy I believe the N64’s official hardware figures were 150,000 polygons per second, so at 30fps, 5,000 would be the targeted polygon count-maybe a bit more depending on amount of shading and textures Mario 64 played things safe-the simpler maps being only around 1,000 tris, and the larger maps being segmented via loading tunnels, I doubt the game often exceeded 3,000 and definitely not 4,000 tris Ocarina of Time optimised things more, though the game was capped at 20fps the maps had a lot with surprisingly low poly count-Kokiri Forest was only about 2,000 tris and Hyrule Field 1,600 tris Then there was Rare’s stuff-they were very talented but didn’t quite focus on stable performance Click Clock Wood in Banjo-Kazooie pushes over 4,000 tris, and Mayahem Temple in Banjo-Tooie pushes over 6,000 And that’s all before objects and characters
Yeah though keep in mind these higher poly levels use a lot of culling and rooms so i find it unfair to count that. If wed count like that then only up 64 would be a 50 thousand triangle level which it clearly is not haha
@@KazeN64 Those numbers are only for the main “foyer” area, so to speak, and Banjo has flight pads allowing the whole terrain to be visible at once-at a wonky framerate though of course
N64's potential getting unlocked countless times I'll never get tired of it lol Also quick question. For Mario's model to switch different hand forms or without his hat and what not, is that simple model swapping or a use of shape keying?
This stuff is generally understood now. Back then they were pioneering this kind of stuff and the limitations were less understood. Suffice to say if they were to make Mario 64 with the knowledge they possess today, even on the N64, you'd see them pursue similar techniques and ideas that Kaze is doing.
Yep modern compilers are so much more efficient than older ones to the point where assembly or other low level code is only useful in very certain niche scenarios
@@daskampffredchen No reason to go to GameCube as the progress on this N64 stuff is far more interesting. The N64 was a complex beast with alot of untapped potential... Kaze is doing things on N64 at framerates that defy all other games in its lifespan. GameCube was already so well optimised there's nothing else to unearth performance wise.
That's not exactly the issue here. Drawing the background after opaque objects requires reading and writing the Z-buffer on every pixel, and as explained in this video, reading and writing the Z-buffer is really slow.
N64 has a pipeline. It loads the whole texture no matter the z tests. Stupid Jaguar would not even tell me if a portal passes any z-tests. I mean it would, but so slow that it does not help me.
@@vurpo7080 Drawing the background over an opaque object does not require writing to the Z-buffer. It only requires reading, and then it fails the check, so it never actually writes.
I can remember when Skelux showed the Star Road 2 preview a long time ago and that felt like the peak of sm64 hacking at6 the time. In a comment about one of Kaze's early hacks he said something along the lines of "nobodys as good (at modelling) as skelux". And while Skelux's artistic vision is still S tier and godlike, I'd say Kaze has gotten closer to that level of polish.
Yeah, I have little talent for art. I am decent at it know because I've practiced it so much. Though I do have biobak helping me rework some of my earlier levels and he's the most talented artist I know. I think we can make something much prettier than even star road 2 with him on the team!
@@KazeN64 I definitely think its far surpassed star road 2 I just wanted to say it in a polite way lol. and of course the artists u collaborate with are dope
Yeah. I'm also wondering what could be done with extensive use of AI for further optimizations making also the graphics better. At least when AI becomes better at coding leaded by Kaze to what he will want to do.
This project inspired me to get into pico8 development. I know it's not the same as making games for the n64, but the restrictions it puts in place give a similar need for creative solutions to problems, and I highly recommend it to anyone who wants to make 'retro' games. The work you've put into this project is incredible, I can't wait to one day play the game for myself! Keep the devlogs coming, I'm finding it so fascinating learning how you've optimised the engine.
What I learn in your videos @kaze, I really try to find ways I can plug this type of looking at problems in my day to day engineering tasks. Thank you for a fresh look from a completely different segment of software engineering and application design.
5:56 I love that you mention how bad lookup tables for trig functions are. My game uses inverse tangent and, to optimize a part of my code, I used a lookup table to approximate it thinking it would be faster, but it ended up being *slower* than just calculating it.
I'd already heard that fill rate and bandwidth were bigger issues than poly count, but this is far more extreme than I expected! Also does this mean that Mario is going to get a poly count boost for the final build of RtYI, at least in cutscenes? :)
If this can be accomplished on an N64, imagine what could be done on a Gamecube? I know we sort f saw with the Wii, but I'd want to see what it could do with it's original limitations.
Star Fox Adventures is a very good technical showpiece of the Gamecube hardware where it looks much more like an early 7th gen title than a Gamecube title.@@remnantknight56 There was a lot of unique tricks to get the fur simulation to look that good and for other things like grass. And also, how the game data is streamed to eliminate loadtimes.
A nice example of "optimization is a waste of time, unless you optimize that really is causing the bottleneck". If your entire processing passes through 10 steps, the slowest step is what dictates the overall speed almost alone. You can optimize all the steps but unless you optimize the slowest one, you won't see any larger improvement, if you can see an improvement at all. Only once you've eliminated the bottleneck, you can check for whose now the slowest step and try to optimize this step as well, and so on. That's why profiling is so important for optimization as only profiling will tell you what the slowest step really is.
Well because it's not economically viable to optimize a device to its fullest. Kaze took this many years to achieve this then the developers back in the 90s will too. Sooner or Later consumers will get bored of the N64 weak power and low poly visuals when they can just upgrade to a better console hardware for more power.
@@deltex8526That's pretty much it. It takes several YEARS, sometimes Decades for a system's hardware to be pushed to it's limits. And simply put, you cannot have a system stay around that long. Especially when, for a company like Nintendo, you can simply upgrade the hardware after 7 or so years and get VASTLY better improvements at a fraction of the cost.
I'm thankful we do too lol. We never really get full potential out of the hardware and even when they get close, it's usually decades after the demise of the hardware. Optimising takes time and money to do and in most cases, it makes more sense to throw more hardware power at the problem than to go overkill on optimisation. But maybe A.I. in the near future could help to change that if it gets really good that it can help us optimise code to get better use out of the hardware.
And this is why it bothers me when people complain about Nintendo consoles (especially the Switch) being "too weak." If you have _proper_ optimization, then processing power isn't as important. Nintendo games always run very well on Nintendo consoles, because they're optimized and designed with the console's limitations in mind... its why 3rd party games are always buggier and have performance issues, and often have lower graphics too. Because 3rd parties don't optimize their games for the console and take shortcuts such as toning down graphics to get it running instead.
Very impressive but as someone who also programs 3D on retro consoles, I have something to add. It technically can draw that many polys, but it's a whole other thing to check backface culling, behind the player polygon clipping, 3D texture clipping, and ground collisions with that many extra polygons. There are only a finite number of cycle the n64 can do per second and you can't have it all without the framerate dropping pretty low. That's why the enemies in Goldeneye has to be so boxy so there can be more on screen. 1 single enemy could be several thousand polys if there was was just 1 floating in space always in front of the player like that landscape you show.
The pcEngine had one cycle to spare per pixel. PSX and Saturn have 4 cycles per pixel. N64 has 7 cycles per pixel. Multiplication is single cycle since Jaguar. To cull back faces you need two multiplications. I think hardware designers and game developers need to justify all those cycles they use up. SNES could spit out mode-7 at 1 px per cycle. N64 blitter runs at 1px per cycle. It is all a pipeline.
Brilliant! I’m not keen on the technical knowhow behind n64 development, but as usual the video excellently explains it on a basic level that I can comprehend. Man, I wish that developers nowadays perform these micro-optimizations on their games, particularly Switch ports. Nintendo, hire this man!
Dude this is amazing, I just absolutely love this kind of stuff, your mods are amazing and it's so cool to see such advancements made on this console even after so long, makes me proud of the video game community to see there is still so much you can do with old tech.
And Nintendo repeats this mistake yet again with the Nintendo Switch. The Switch struggles massively on games with large, multipass post-processing chains or that render big transparency effects right on the camera, because of its slow DRAM.
Man, if you can overhaul and improve Super Mario 64's code THIS much for optimal performance, I'd love to see you do the same thing for Conker's Bad Fur Day! That game has aged horrendously. Way too many areas have the framerate tanking to borderline unplayable levels. It's crazy how much love that game had back in the day considering. I don't think I've ever played an N64 game that struggled to the same degree as Conker.
Am not a coder but can your engine be adapted for fps, 3d fighting games or racing games ? Would be amazing if in the near future it can be easily adapted for those types of games so others can use it and do crazy ports, maybe a tekken or bloody roar port for the n64 😅
Probably but it's more then likely deeply tied into the game code. He could potentially strip out all the Mario related code to make it more general purpose.
The fighting genre is one the few that actually did benefit CD-media. PSX had only 2MB RAM, but on fighting games they could actually utilize CD -space for lots of characters, arenas and FMV's. However, take a look of most recent build of Smash Remix for N64, to see how N64 can handle tons of characters & arenas as well these days.
I always like the idea of putting high tier graphics and trying to make it work on old tech. like that video of rendering rtx on a calculator, or demastering minecraft on the psp
I might be wrong but didnt the N64 have just 4kb of texture cache? The CPU and GPU of the N64 were really powerful for their time, even featuring texture filtering but fitting "high res" textures through that cache was basically impossible. You had to chop textures into multiple parts or leave them los res, thats why many games looked blurry
amazing job! Games like Banjo Kazooie actually improved the visuals a lot by having transitions in the textures. Also you can try to use vertex colors to make your game even prettier or add shadows without needing any additional memory as its done by the GPU in real time. Again, gute Arbeit!@@KazeN64
I like this resurgence of fifth gen hardware getting pushed to its absolute limits, first we had a team porting over the first two levels of Unreal to the Saturn (big draw distance and all) and now the N64 is being pushed to its limits, I'd like to see what people could do on the PS1 now.
Because of the small texture cache, and the higher amount of polys I've noticed in N64 games, especially recent projects like yours, I've always wondered why more textures on surfaces weren't drawn using polys instead. People would use blurry text or arrows, when drawing simple shapes and even some text should be more performant on the N64. And this video just furthers that question. I see even recent projects like the Portal 64 project (not yours) creating simple shapes on surfaces using low quality textures instead of triangles, and I have no idea why. Do you have any idea why that might be the case? Because I just simply am not enlightened enough to understand this trend in N64 game design.
The equivalent of a 64×64 texture with vertex colors requires 4096 vertices to be representable, and unlike a texture, it can't be repeated unless you also repeat this 4096-vertex lattice. Vertex colors work better for low-frequency data, such as lighting or ambient occlusion.
Basically because the only thing you get that way is flat colors and gradients. It can look cool (characters and enemies in FF7 is a good example) but it's a very particular style and not something that can actually replace texture mapping.
That's my point. More complicated stuff can be textures, but an arrow, or even a bit of text? Why not triangles? Think of all the blurry text and simple shapes on flat surfaces that are textures instead of polygons in various n64 games. Why not use triangles for those?
It's funny how once again games are bottlenecked by memory bandwidth/latency yet all people ever talk about is being cpu or gpu limited. It's been memory access all along. On modern CPUs fetching a value from main RAM takes 100x as many cycles as from L1 cache! Most of the time the CPU is just waiting for data.
@@tinoesrohoElectrically, RAM gets slower when it gets larger. This is due to the fan out and parasitic capacity of the bit and word lines. Also the address generator gets power hungry.
Diggi es ist echt otherworldly mit welcher hingabe du das alles machst. Bin super gespannt auf den Mod und kanns kaum erwarten den zu zocken :)! Wird ein guter Winter dieses Jahr.
I like the low poly look of 64 games, and low poly games run pretty well on my pc and phone aswell Edit: i wasn't talking about emulators but they also run well
So overdraw tanks performance, while lots of polygons don't. So kinda like the switch then saw that one video once that showed that apprently having polygons for individuals leaves is faster then a cutout shader.
That's pretty much why Age of Calamity just kinda dies when the switch is docked. It basically overdraws nearly everything with all the semi-transparent effects and because docked mode uses a higher resolution compared to handheld, it also has to overdraw much more. Rip Switch.
I know nothing on game development, but was really impressed with the high poly count you demonstrated. It's inspiring to see people's passions for a +25-year-old game engine and showing us how performance can be optimized. Thank you.
This skill of pushing a computer to it's maximum ability with optimization is dying out. It's just cheaper/ easier for a business to throw more computing power at the problem. A lost art
sadly kaze himself admits he can only do this stuff with the knowledge gained since the n64s launch, its more like if kaze was a dev back then and they developed the game for 10 years and it came out in like 2006
no amount of time constraints can ever excuse the abomination that is shadow.c - 4 raycasts to find the same surface 4 times, taking up upwards of 20% of your games performance in some areas (or the dynamic collision on the DDD sub)
All these advances in N64 rendering has got me wondering how good 2D could look like on the hardware. As far as I’m aware the N64 doesn’t have any dedicated sprite hardware, but given what’s possible on consoles like the PS1 and Saturn I think it could look awesome.
The n64 really bad at sprites, the reason the saturn and ps1 can do it is because they are designed to do it. No amount of optimization changes the n64s limitations.
@@RetroDark_The_Wizard Ok, then explain Bangai-O, Mischief Makers, Ogre Battle, Super Robot Taisen 64, Dr. Mario 64, Puyo Puyo Sun and ~n Party, Yoshi’s Story and various other sprite-based games on the console. The N64 is very capable of doing stuff like this and I’d like to see it taken further.
@@8squared007 A lot of the spritework was limited in animation and in clarity. In mischief makers the sprites are blurry. Same with yoshis story. It’s just that the N64 was designed by a cgi company, and this video is just improving what the n64 was good at, 2d sprites are something it was quite bad at compared to the PS1 and Saturn.
@@RetroDark_The_Wizard Ok, but there are other examples I’ve listed like Bangai-O which prove that it can do great 2D without looking blurry and recent homebrew projects that also serve to help my argument. “Harder to program” does not mean “Worse”, people simply need to do more research.
@@8squared007 It is also harder to program, but again the 2d on Saturn and Ps1 is more fluid and detailed than anything on n64. The n64 had a lot of issues, memory is a big one. I get you like the n64, but it has it limits. I have played a lot of sprite based n64 games and they really weren’t anything special visual wise. It’s impressive, but not that good.
I'm so happy to see the N64 modding/homebrew community going strong and getting better and better. ‘Return to Yoshi’s Island’ is the most anticipated mod ever! Can’t wait. You’re a genius!
If Nintendo had any sense whatsoever they'd hire you and pack this game in with an emulator and ship it on their consoles. Or better yet make a mini N64. I know it sounds cheesy, but other companies frequently hire modders for official games, or buy the mods outright.
In the series "nier:automate" the chatacter "2B" her ass alone consist of over 300k polygons If somebody would ever be able to run this model on real N64 hardware it's either you or nobody
so many of those triangles would be denormed to a triangle with 0 sidelength that the n64 would likely only actually render 10k or so of those (unless you zoomed all the way in on the ass)
@@KazeN64 😏 oh well was worth the shot I gues this to much asked for the N64, still thinking about the idea makes me smile Perhaps a few years could perhaps do it, or not, time will tell, still thank you for the response
Might be a silly question, but if you're locking your hack to 30fps, why not push the system much further to the point that it would barely stay above 30fps even when unlocked?
That's what I am doing! The earlier levels were not designed with this powerful of an engine in mind so it's often not fully taken advantage of. But the newest levels are designed in a way that the most laggy view often struggles to hit 30.
This video is even good for modern computers. Today even ram and memory is considered slow and writings code to keep braching low is worth it (if you need speed). We even have cryptography functions that are "memory hard" meaning it slows down hackers by making it slower to compute by making it use ram access times in a way that cant be shortcut-ed, slowing down brute force attacks.
How much does adding an Expansion Pak affect performance? DK64 uses it to improve its framerate. And yeah, the tech behind hardware evolved much faster than the code optimization.
I admit I have no idea what 50% of the stuff you're explaining is yet the way you explain things is so interesting and gets me invested and makes me listen through the entire thing every time. Great video as always!
I love 3d modeling and kinda hate the modern "low poly" artstyle, you know the one with the solid pastel colors and no textures. It felt unique once it first became popular but Ive grown to mostly hate it. Thats a complete aside though, for n64 mods high poly really fits the artstyle imo, It gives the feeling of "old console with a game way ahead of its time" and I think that really fits the console modding scene! Edit: when thinking about it, does the memory limitation mean that game engines that can get depth buffers without the normal, "calculate distance and compare" method, like bsp algorithms can get a ordered depth buffer without a typicalZ buffer. I know the quake and doom engines (both of which appeared on the n64) could improve preformance a lot. It also uses the cpu to sort through it so thats another good use of CPU time!
@@hughjanes4883 Yeah but that's not relevant to retro inspired modern games, they never incorporate that. Also it doesn't come up with emulation which is how i do everything today.
Wait, so N64 had overdraw on OPAQUE PIXELS? That sounds like a nightmare; in VR dev we have to be very careful to avoid overdraw on transparent pixels, but having to optimize for opaque sounds completely insane lol
On modern hardware you also have an overdraw overhead (also on opaque pixels), but it is hidden by good cache structure and that currently pixel shaders are much more complicated so relatively take more time than memory access. Also techniques like deferred shading might help with complicated shaders at further cost of memory bandwidth.
Overdraw was a huge problem throughout the 90s for all of the consoles. At least with the N64 you had a z-buffer in hardware to fall back on. I've been playing around with Saturn programming and it blows my mind what techniques 90s programmers were used by Hexen, Sonic R and other games to minimize Overdraw. I can't imagine having to develop this in software when the concept of a camera fulcrum was still relatively new to the industry
@@themeangene The guys who made Sonic R and co were absolute mad lads. I have never otherwise heard developers consistently custom-creating features on the software side that the console did not ship with and coming back many years later to create an official update for one of their games, which they have to release as a romhack no less due to no longer being partners with the publisher.
I wonder if it would be possible to do something like UE5's nanite on N64 that culls and resizes triangles based on the camera position so you can get the best quality to performance ratio.
Even looking to newer console's, it makes me excited to see your work! With all this optimization on just he N64, just imagine what levels of optimization are possible for new gen consoles! Because the bottlenecks do exist, even if they are just so large you can easily fit most games into them. But acknowledgement of the bottleneck having bounds, and thus optimizing your games, opens a world of possibilities.
I've not watched the video but I agree with the thumbnail because I've spent so long doing low poly art that when I see it in ps1 style games and stuff I'm not like "wow its like a real retro game" I'm like "wow they really just got 5% of the way of doing a medium poly model and just applied awful oiled up textures and said 'it's just the style we chose'"
One correction Sauraen brought up after watching this video:
"You said 320x240x4x10 = about 3 MB with 4 bytes per pixel (2 for color and 2 for Z). But that's not true, it has to both read and write the Z buffer (so the real value is 320x240x6x10 = about 4.5 MB""
Ok
What you said
I was about to call you a nerd, and then I read your username and remembered you basically do this stuff for a living
We need an 64 2 console with better spec. Most won't bother. Just use a tablet or pc or laptop. Save on lots of work.
hey, what about the PS One?, because the first Playstation can render a lot more polygons than the N64
Proof of the N64 being able to handle a shitton of polygons is Mario Kart 64. Every single course has very smooth curves that need a lot of polygons
Even more impressive is the credits sequence, where the entire levels are rendered all at once without culling. And MK64 is an early N64 game that's only somewhat more optimized than SM64.
I thought the whole downside with the n64 was it sucked at rendering 2d?
@@seronymusThe N64 had a separate microcode dedicated to 2D, but the limited texture cache meant that you're hitting memory more and using more polygons to render more detailed textures. It was doable, but was a delicate balancing act.
They require really really few polygons actually. Mario kart for the N64 was mostly sub 1000 polys loaded at any point in time.
It has a lot less geometry than normal SuperMario64.
I wonder if mario kart ds, or even wii could be feasible some day
It's funny how one of the keys to better performance is not having large objects in the center of your level when so many of SM64s levels are exactly that. It's usually all centered around some sort of mountain or large object.
Not large objects, large stacked textures. So not really comparable, those mountains are made of lots of smaller textures and what's behind _should_ be culled(don't quote me on that)
Pretty much every sm64 level is like that to some degree except hazy maze
Kinda misunderstood the point - the key is to draw that large object first because if you don't, then you'll have performance issues.
toooo be fair, that feels natural. theres a centerpiece to most kids playgrounds too
@@goeland4585 textures are not the main cause of lag here, its each pixel checking the zbuffer causing the lag. Each pixel having to read from a texture will at most only double the lag
I'm glad nothing has stopped you from persuing your passion like this. I've been watching your content for years, and it's always a treat to see what you've been cooking up.
Yeah, it’s really incredible.
Things have stopped him, mainly Nintendo taking down some of his videos.
@@JerryDX Weird definition of "stopped" lol
what he said.
@@JerryDXYou can still find all of his projects tho
if you take mario 64 as a early game and conker´s bad fur day as a late game, the optimization is impressive as well
Oh yeah, much of this was discovered over the course of the N64's life. Mario 64 has a ton of places where they did something exactly wrong for the N64 because that would be the right way to do it on the NES and SNES.
Honestly conkers bad furday is one of the most impressive games on the system. I think too many games really got screwed by the DD failure.
@@benamisai-kham5892 They certainly were. Quite a few games were developed with the DD in mind and then turned into a somewhat janky mess when they had to be downported to base-hardware.
"It's true what they say. The grass is always greener. You don't really know what it is you have, until it's gone. Gone. Gone..." - Conker
@@keiyakinsinteresting, can you give some examples?
Thanks for the shoutout Kaze! F3DEX3 is actively in development and I hope to show more of its features soon!
Just when I thought we've almost reached the limit enter people like you who simply rewrite the code of the N64 itself. Love your dedication to pushing the boundaries, keep on keeping on!
sauraen is going beyond the code and is optimizing the console itself with the F3DEX3 microcode@@butterflyfilms939
Make an ROM Manager tweak version of F3DEX3, it should be nothing hard for you.
So long story short, the N64 doesn't like shuffling around memory that much, but is more than happy to bang out uber complicated formulas. It is really showing off its SGI heritage with that one!
Well, if I remember correctly, it was nintento's decision to cheap out on RAM (both size and speed).
Sad, with the cartridge read speed, if the RAM bus could go more vroom vroom, you would have a really good console for it's time.
Imagine streaming big texture directly from the cartridge...
@@huaweiphone4357 there is at least one developer that did exactly this, and the results are as stunning as you'd expect. As for cheaping out, it's not like the Nintendo 64 was built to be a cheap kid's toy or anything...😜
@@johnrickard8512 the high res texture tech demo? Nope, he didn't.
He use clever mip mapping, and (a lot) of texture swaps.
He loaded small chunks of textures as fast as possible to simulate big textures.
The working memory is extremely limited for the gpu, and accessing it is time consuming.
As far as I know, you can't ask the gpu to go fetch a texture (big or small) directly from the cartridge, you need the cpu to fetch it, copy to ram, and then ask the gpu to use it.
And you are limited by the (byte) size of the texture (so lower color depth mathematically gives you bigger textures).
I can be wrong, but I don't think so.
@@johnrickard8512 Hey, guess what? The Saturn and Playstation were also both cheap kids toys.
It's not just Nintendo that makes kids toys you know.
@@gamephreak5 yes, and the PlayStation also had super slow memory and an even slower CPU, and the Sega Saturn....well if you weren't someone like Traveler's Tales you would find it to be impossible to use to its fullest.
Imagine exporting a triforce to your game and the 3 triangles lag your game 30 fps
when time travel gets invented you know im mailing a usb stick with all of kaze's videos to the nintendo headquarters while sm64 is in development
get sent to jail speedrun
usb was so new at the time i doubt they would even have usb ports ;o;
punched cards then@@fridaykitty
Should be a cd
@@fridaykitty3.5" floppy and and a translator...both since it's Japan
this thing probably explains why the dire dire docks level was split into two 3d models that dynamically switch
Yep. The fastest way to render something is to skip it, no matter whether it's triangles or memory that's killing performance!
Then they went and flagged the sub as dynamic. Oops.
just calling something is almost always preferable over powering through it
THEY WHAT
@@danolantern6030Yep! Use the camera lock feature and swim through the connecting tunnel and you'll see the first area unload and the second area load in
@@danolantern6030 yeah, honestly my smooth brain thought, "if a polygon is covered up by other polygons, it doesnt affect performance" lol but no they have to unload the polygons
These are the insane benefits you get when one person has an indepth understanding of both game design and code. Modern video games have become so unoptimized because of how complex both aspects have become. I don't think any individual developer holistically understands a modern game such as Baldur's Gate 3 or Spiderman in the same way you understand SM64. I wonder if that's even possible.
From what I've read large game studios don't make effective use of generalists due to the management style. There are probably quite a few people who know a bit about the entire stack but if a game has 300 people working on it most of them will work on one narrow part of the project. On top of that companies like Bethesda outsource parts of the work (like with Starfield), which makes the communication problem even worse.
People like that probably shine more in indie projects. For example there was a PS4 exclusive indie game which used real-time cone tracing (faster version of ray tracing) for the graphics, built from scratch and years before hardware was designed to accelerate it.
The Devs they hire are just terrible mostly
@@FlamespeedyAMVthat's not it the devs they hire are usually overworked and put to work only in specific areas of the game while not knowing everything else about it and it's not their fault as it's mandated by their manager
@@Morrigan101 The quality of programmer is also to blame. You can't blame everything on overwork. You think overworking didn't exist before?
@@SaHaRaSquadyup, this. A lot of studios separate their workers with lacks of communication between teams. If the teams could all communicate freely and work creatively together we'd probably have so many better end products...
Or they'll split teams to work on different projects at random.
10k is about the polycount of an environment on the Gamecube, for reference. Main characters in that gaming gen would usually be a couple of thousand.
Nowadays, for games with the Triple A Aesthetic TM, a single important character would easily be over 10k.
10k would still be considered lowpoly by modern rendering standards in film and animation. But film and animation have always been able to utilise higher polycounts than gaming, as their 3D doesn't have to render in real time.
10k? Try over hundreds of thousands. Kratos had 80k in GoW PS4, 5 years ago. Grand Turismo cars have half a million each.
On a game where you can create your own content like VRChat, 60-70k polys is generally the standard, though often custom content goes way above and beyond that.
@@TechArtAlexand all that design and resources spent for a forgettable racing game. Wasteful
@@seronymus b r u h
No wonder this looks like a GameCube game
If only modern games made processing efficiency more of a priority
Imagine the leaps in capabilities new hardware could have when optimized to the level you're getting the N64 to... absolute legend
Edit: "Try a bit more" doesn't mean "it must be the only thing devs do before anything else happens with a project" but I really shouldn't be surprised that people are trying to twist my words for the sake of forcing discussion
My university prof always tells us "Don't worry about efficient coding too much, computers are fast enough these days"
For me this hurts to hear, while it's true for many applications it leads to lazy programming... using inefficient algorithms and data structures because they are easier to implement... And then you have shitty software as a result :(
look up the ghost of Tsushima gdc talks if you want to see some optimised stuff, half the game runs on the ps4s gpu
@@zaptheporcupine1578I like to take that excuse and turn it into "modern computers are fast...so let's use them!"
id software's idtech 6 and 7 engines are a good modern example of making the most of modern computers
doom eternal's vulkan renderer is pure art with how hyperoptimised and efficient it is while pushing beautiful visuals
I wonder if we'll see people doing the same thing to the Switch in 25 years. I'm sure that system is capable of a lot more than the sword and shield graphics let on... lol
The part about not having a big thing in the center of the level to improve performance is so interesting because that's how most levels in early 3D platformers like Super Mario 64 and Banjo-Kazooie were designed, having a big thing in the center gets the player's attention and gives them a sense of direction which helps since you can't just go to the right like in 2D.
1:16 "It runs at a pretty stable *encoding glitch*" nice timing XD
21 Lmao
I noticed the crate your standing on at 6:52 is having its vertices warp and wobble in a similar way the PS1 used to do. and even the side of the crate's edges seem to tear at 6:56 Is this a result of the Z-Buffer optimizations or something to do with the Sine Cosine optimizations. or was it older footage? I am a bit curious but i am also not very much a programmer at all.
finally, PSX wobble + sheer polygon processing of the n64
It might be due to the crate being a dynamic object or similar. In many games (software Quake by example), models aren't rendered with perspective-correct texturing with a few exceptions (ammo pickups).
@@Calinouuh, check the bounding box size and turn on correction. It is like two instructions in RISCV and uh x86 code is a bit more inefficient, but not much.
Could be a glitch.
The Z buffer makes fuck all difference in that sense. Anyways, apparently there's a bug in the SGI silicon that causes texture glitches in some situations.
You are single handedly moving Super Mario 64 and it's modding forward. Without you who knows when a person would even think to take up this work. Literally making history. I don't care how long I have to wait for all of your work and documentation to be done. All of your optimizations are well worth all the wait. An actual revolution of content will be made from your efforts. Nintendo should literally be paying you at this point. Thanks for your work Kaze, I hope you're as proud as we all are.
It most likely is moving N64 retrogaming forward as a whole because some of these optimization-approaches should be universal in at least a broad sense.
Meh.
@@mirabilisWhy?
@clouds-rb9xt Because there are other people too. Like... aren't you forgetting about the whole SM64 decomp team?
@@mirabilis decomp; decompilation. They are reverse engineering code. This is a code rewrite to give it a better more efficient engine. There are plenty of modders who've done great work, and more great work will be done, especially when this is finished. I don't see why you feel it's necessary to deminish Kazes work. Even in this video he talked about getting more outside work. But Kaze is the forefront for this project. That doesn't mean that others don't deserve recognition, but the "meh" is blatantly trying to deminish the work he's done for no reason, as if Kaze doesn't deserve recognition. If you want to do that, maybe don't play any of the mods that will be built off the back of this project. I understand there are others doing great work. I just want to show my appreciation for Kaze. Maybe you think "single handedly" is too strong of a statement, and that's fine, fair even, but I don't think I need to remind you of the potential this project will bring to other modders. If you feel I was trying to deminish others works, I apologize, I wasn't.
People ignore everybody was learning 3d at the time, artists included.
Making all kinda shapes on 3d software is easy now, but can you imagine what kind of primitive tools they had to work back them?
Even if you had the mastery of the software, they had to translate 2d pixel art into 3d art without the luxury of just doing a realistic model. Once you have a working 3d model is easy to criticize but those guys were the pioneers, doing stuff without a reference.
Devs back then was better for a reason
Making good 3d models to my knowledge happened plenty bake then like the donkey Kong show the problem was low poly models
quick question actually; are some objects being deloaded before they actually leave the camera? I saw a few moving platforms get deloaded on screen instead of off screen.
Crazy
I believe the N64’s official hardware figures were 150,000 polygons per second, so at 30fps, 5,000 would be the targeted polygon count-maybe a bit more depending on amount of shading and textures
Mario 64 played things safe-the simpler maps being only around 1,000 tris, and the larger maps being segmented via loading tunnels, I doubt the game often exceeded 3,000 and definitely not 4,000 tris
Ocarina of Time optimised things more, though the game was capped at 20fps the maps had a lot with surprisingly low poly count-Kokiri Forest was only about 2,000 tris and Hyrule Field 1,600 tris
Then there was Rare’s stuff-they were very talented but didn’t quite focus on stable performance
Click Clock Wood in Banjo-Kazooie pushes over 4,000 tris, and Mayahem Temple in Banjo-Tooie pushes over 6,000
And that’s all before objects and characters
Yeah though keep in mind these higher poly levels use a lot of culling and rooms so i find it unfair to count that. If wed count like that then only up 64 would be a 50 thousand triangle level which it clearly is not haha
@@KazeN64 Those numbers are only for the main “foyer” area, so to speak, and Banjo has flight pads allowing the whole terrain to be visible at once-at a wonky framerate though of course
Probably because those numbers require a lot of optimization and its not just polygons
The poly count was microcode dependent. I believe Kaze's romhack uses F3DEX2 instead of the old Fast3D.
People keep on thinking older models look better cuz their low poly, but its more of the art style that makes it look good
Fucking THANK YOU, FINALLY someone says it.
N64's potential getting unlocked countless times I'll never get tired of it lol
Also quick question. For Mario's model to switch different hand forms or without his hat and what not, is that simple model swapping or a use of shape keying?
they simply swap out the model
i'm curious what performance you could get on later consoles with the level of optimisation done here
Imagine Kaze getting into GameCube programming
I believe Unreal's nanite is doing some optimization shenanigans for poly pushing (althought unreal has quite a high baseline requirements)
This stuff is generally understood now. Back then they were pioneering this kind of stuff and the limitations were less understood. Suffice to say if they were to make Mario 64 with the knowledge they possess today, even on the N64, you'd see them pursue similar techniques and ideas that Kaze is doing.
Yep modern compilers are so much more efficient than older ones to the point where assembly or other low level code is only useful in very certain niche scenarios
@@daskampffredchen No reason to go to GameCube as the progress on this N64 stuff is far more interesting. The N64 was a complex beast with alot of untapped potential... Kaze is doing things on N64 at framerates that defy all other games in its lifespan. GameCube was already so well optimised there's nothing else to unearth performance wise.
When I looked at the thumbnail I thinked "Your low poly: (That ultra low poly SM64 Mario)
Turboflex low poly: (Normal SM64 Mario)
Bro has a PhD in Super Mario 64
Amazing video and story telling. Thank you for keeping our childhood console alive.
*storytelling
Background in modern game engines is drawn after opaque and before transparents to avoid overdraw. On fps, the weapon is drawn first too.
That's not exactly the issue here. Drawing the background after opaque objects requires reading and writing the Z-buffer on every pixel, and as explained in this video, reading and writing the Z-buffer is really slow.
N64 has a pipeline. It loads the whole texture no matter the z tests. Stupid Jaguar would not even tell me if a portal passes any z-tests. I mean it would, but so slow that it does not help me.
@@vurpo7080 Drawing the background over an opaque object does not require writing to the Z-buffer. It only requires reading, and then it fails the check, so it never actually writes.
Goated creator
Real
Hardly, he’s too egotistical
Nah, Simpleflips could easily fold him. Mr Flips himself comfirmed it during Desert Bus 64
@@nickolasevanovichElaborate
Says everyone, everyday on almost every channel.
I can remember when Skelux showed the Star Road 2 preview a long time ago and that felt like the peak of sm64 hacking at6 the time. In a comment about one of Kaze's early hacks he said something along the lines of "nobodys as good (at modelling) as skelux". And while Skelux's artistic vision is still S tier and godlike, I'd say Kaze has gotten closer to that level of polish.
Yeah, I have little talent for art. I am decent at it know because I've practiced it so much. Though I do have biobak helping me rework some of my earlier levels and he's the most talented artist I know. I think we can make something much prettier than even star road 2 with him on the team!
@@KazeN64At least you don't rely as much on clickbait (the bad kind) as skelux
@@KazeN64 I definitely think its far surpassed star road 2 I just wanted to say it in a polite way lol. and of course the artists u collaborate with are dope
I would love to see optimizations like this done to Castlevania 64. Amazing work as always!
Kaze, any time I see you cooking, I’m always left floored. It’s amazing how you push the N64 to its limits.
Yeah. I'm also wondering what could be done with extensive use of AI for further optimizations making also the graphics better. At least when AI becomes better at coding leaded by Kaze to what he will want to do.
@@Triiiple3 Kaze has done some things asking ChatGPT for optimizations only for it to give him "optimizations" that perform worse lol.
I always learn something new when I watch you, it's easy to tell that you really love discussing all of this!
This project inspired me to get into pico8 development. I know it's not the same as making games for the n64, but the restrictions it puts in place give a similar need for creative solutions to problems, and I highly recommend it to anyone who wants to make 'retro' games.
The work you've put into this project is incredible, I can't wait to one day play the game for myself! Keep the devlogs coming, I'm finding it so fascinating learning how you've optimised the engine.
No, it's not at all a substitute for Retro games. Homebrew NES development is much better.
What I learn in your videos @kaze, I really try to find ways I can plug this type of looking at problems in my day to day engineering tasks. Thank you for a fresh look from a completely different segment of software engineering and application design.
My man's one year away from enabling DLSS and Raytracing on the N64
5:56 I love that you mention how bad lookup tables for trig functions are. My game uses inverse tangent and, to optimize a part of my code, I used a lookup table to approximate it thinking it would be faster, but it ended up being *slower* than just calculating it.
AtariJaguar has trigonometric look up tables in ROM because it is fast. (ROM with a dedicated bus to the DSP ).
I'd already heard that fill rate and bandwidth were bigger issues than poly count, but this is far more extreme than I expected!
Also does this mean that Mario is going to get a poly count boost for the final build of RtYI, at least in cutscenes? :)
Someone finally has the balls to say it
high poly sucks
low poly is overrated
mid poly is...mid
2D is king
Bro what 😭
What poly is good then 😭
@@Sir-P-Pizzano poly, embrace pixel sprites
At this point we might as well have a whole-ass GameCube game on the N64.
someones already done a tech demo with it. the portal 64 guy
If this can be accomplished on an N64, imagine what could be done on a Gamecube? I know we sort f saw with the Wii, but I'd want to see what it could do with it's original limitations.
Then, there would be a sequel called Super Mario 128.
Star Fox Adventures is a very good technical showpiece of the Gamecube hardware where it looks much more like an early 7th gen title than a Gamecube title.@@remnantknight56
There was a lot of unique tricks to get the fur simulation to look that good and for other things like grass. And also, how the game data is streamed to eliminate loadtimes.
Super Mario Sunshine Code rewrite, when?
A nice example of "optimization is a waste of time, unless you optimize that really is causing the bottleneck". If your entire processing passes through 10 steps, the slowest step is what dictates the overall speed almost alone. You can optimize all the steps but unless you optimize the slowest one, you won't see any larger improvement, if you can see an improvement at all. Only once you've eliminated the bottleneck, you can check for whose now the slowest step and try to optimize this step as well, and so on. That's why profiling is so important for optimization as only profiling will tell you what the slowest step really is.
Profiling failed with shared resources.
It's a bit of a shame that we're always so eager to replace hardware before it's full potential is realized
Well because it's not economically viable to optimize a device to its fullest. Kaze took this many years to achieve this then the developers back in the 90s will too. Sooner or Later consumers will get bored of the N64 weak power and low poly visuals when they can just upgrade to a better console hardware for more power.
@@deltex8526That's pretty much it.
It takes several YEARS, sometimes Decades for a system's hardware to be pushed to it's limits. And simply put, you cannot have a system stay around that long.
Especially when, for a company like Nintendo, you can simply upgrade the hardware after 7 or so years and get VASTLY better improvements at a fraction of the cost.
I'm thankful we do too lol.
We never really get full potential out of the hardware and even when they get close, it's usually decades after the demise of the hardware.
Optimising takes time and money to do and in most cases, it makes more sense to throw more hardware power at the problem than to go overkill on optimisation.
But maybe A.I. in the near future could help to change that if it gets really good that it can help us optimise code to get better use out of the hardware.
That's why the Gameboy and the Switch lasted/lasts for so long lol
*its full potential (possessive)
it's = contraction of "it is" or "it has"
And this is why it bothers me when people complain about Nintendo consoles (especially the Switch) being "too weak." If you have _proper_ optimization, then processing power isn't as important. Nintendo games always run very well on Nintendo consoles, because they're optimized and designed with the console's limitations in mind... its why 3rd party games are always buggier and have performance issues, and often have lower graphics too. Because 3rd parties don't optimize their games for the console and take shortcuts such as toning down graphics to get it running instead.
You turned that slow memory bus in a fast memory rocket.
Very impressive but as someone who also programs 3D on retro consoles, I have something to add. It technically can draw that many polys, but it's a whole other thing to check backface culling, behind the player polygon clipping, 3D texture clipping, and ground collisions with that many extra polygons. There are only a finite number of cycle the n64 can do per second and you can't have it all without the framerate dropping pretty low. That's why the enemies in Goldeneye has to be so boxy so there can be more on screen. 1 single enemy could be several thousand polys if there was was just 1 floating in space always in front of the player like that landscape you show.
The pcEngine had one cycle to spare per pixel. PSX and Saturn have 4 cycles per pixel. N64 has 7 cycles per pixel. Multiplication is single cycle since Jaguar. To cull back faces you need two multiplications.
I think hardware designers and game developers need to justify all those cycles they use up. SNES could spit out mode-7 at 1 px per cycle. N64 blitter runs at 1px per cycle. It is all a pipeline.
Kaze's next video,
N64 is able to run current gen Playstation and Xbox games.
N64 is able to calculate the name of God.
Brilliant! I’m not keen on the technical knowhow behind n64 development, but as usual the video excellently explains it on a basic level that I can comprehend.
Man, I wish that developers nowadays perform these micro-optimizations on their games, particularly Switch ports.
Nintendo, hire this man!
Dude this is amazing, I just absolutely love this kind of stuff, your mods are amazing and it's so cool to see such advancements made on this console even after so long, makes me proud of the video game community to see there is still so much you can do with old tech.
And Nintendo repeats this mistake yet again with the Nintendo Switch. The Switch struggles massively on games with large, multipass post-processing chains or that render big transparency effects right on the camera, because of its slow DRAM.
Man, if you can overhaul and improve Super Mario 64's code THIS much for optimal performance, I'd love to see you do the same thing for Conker's Bad Fur Day! That game has aged horrendously. Way too many areas have the framerate tanking to borderline unplayable levels. It's crazy how much love that game had back in the day considering. I don't think I've ever played an N64 game that struggled to the same degree as Conker.
On original hardware? I have no such problem on emulation other than multiplayer.
@@magicjohnson3121 Yes, on original hardware it runs horribly.
I was wondering if you could port Killer Instinct Arcade to the PSP? PSP should be able to handle it as it can play Tekken 3, MK2 arcade etc
you amaze and terrify me with how well you know this engine
_"But why? Why go through all this trouble when they optimizations are trivial on modern hardware?"_
...cuz it makes a fascinating UA-cam video. Duh.
we need some efficiency for Goldeneye/Perfect Dark🙏🙏🙏🙏
This guy can pretty much fix Donkey Kong 64's memory leak
Am not a coder but can your engine be adapted for fps, 3d fighting games or racing games ? Would be amazing if in the near future it can be easily adapted for those types of games so others can use it and do crazy ports, maybe a tekken or bloody roar port for the n64 😅
Probably but it's more then likely deeply tied into the game code. He could potentially strip out all the Mario related code to make it more general purpose.
The fighting genre is one the few that actually did benefit CD-media. PSX had only 2MB RAM, but on fighting games they could actually utilize CD -space for lots of characters, arenas and FMV's. However, take a look of most recent build of Smash Remix for N64, to see how N64 can handle tons of characters & arenas as well these days.
I always like the idea of putting high tier graphics and trying to make it work on old tech.
like that video of rendering rtx on a calculator, or demastering minecraft on the psp
I might be wrong but didnt the N64 have just 4kb of texture cache? The CPU and GPU of the N64 were really powerful for their time, even featuring texture filtering but fitting "high res" textures through that cache was basically impossible. You had to chop textures into multiple parts or leave them los res, thats why many games looked blurry
yeah it does - all the textures i've used in the video here do fit in the 4kb cache.
amazing job! Games like Banjo Kazooie actually improved the visuals a lot by having transitions in the textures. Also you can try to use vertex colors to make your game even prettier or add shadows without needing any additional memory as its done by the GPU in real time. Again, gute Arbeit!@@KazeN64
Particularly in Rare and Factor 5 games, texturing was optimized by tiling small textures, as well as using greyscale textures with vertex colour.
You missed the MegaTexture demo.
8:52 This is mind blowing! all the optimization and everything, but I love this Super Mario Bros 2 Level! fantastic work!
Whahh
MOM THE NEW KAZE VIDEO DROPPED
I like this resurgence of fifth gen hardware getting pushed to its absolute limits, first we had a team porting over the first two levels of Unreal to the Saturn (big draw distance and all) and now the N64 is being pushed to its limits, I'd like to see what people could do on the PS1 now.
Because of the small texture cache, and the higher amount of polys I've noticed in N64 games, especially recent projects like yours, I've always wondered why more textures on surfaces weren't drawn using polys instead. People would use blurry text or arrows, when drawing simple shapes and even some text should be more performant on the N64. And this video just furthers that question. I see even recent projects like the Portal 64 project (not yours) creating simple shapes on surfaces using low quality textures instead of triangles, and I have no idea why. Do you have any idea why that might be the case? Because I just simply am not enlightened enough to understand this trend in N64 game design.
The equivalent of a 64×64 texture with vertex colors requires 4096 vertices to be representable, and unlike a texture, it can't be repeated unless you also repeat this 4096-vertex lattice. Vertex colors work better for low-frequency data, such as lighting or ambient occlusion.
Basically because the only thing you get that way is flat colors and gradients. It can look cool (characters and enemies in FF7 is a good example) but it's a very particular style and not something that can actually replace texture mapping.
That's my point. More complicated stuff can be textures, but an arrow, or even a bit of text? Why not triangles? Think of all the blurry text and simple shapes on flat surfaces that are textures instead of polygons in various n64 games. Why not use triangles for those?
N64 is proud of its trilinear mipmaps. LoD for meshes is hard. So an engine would need to render to texture on level loading?
@@FlergerBergitydershEven a single letter takes dozens of triangles, it’s just way less efficient than a texture
Kaze: we cant copy paste this bomb omb Battlefield remake. Aslo kaze: the n64 can use 12552525 triangles
It's funny how once again games are bottlenecked by memory bandwidth/latency yet all people ever talk about is being cpu or gpu limited.
It's been memory access all along.
On modern CPUs fetching a value from main RAM takes 100x as many cycles as from L1 cache! Most of the time the CPU is just waiting for data.
AMD's approach with the X3D line has shown that even with unoptimized processors, larger caches kick asses!
@@tinoesrohoElectrically, RAM gets slower when it gets larger. This is due to the fan out and parasitic capacity of the bit and word lines. Also the address generator gets power hungry.
This is amazing. It's so heartening to know that more people love the N64 as much as I do. Have a great weekend mate! Happy coding
What’s the song at 0:38 ?
Yoshis Island from Mario Party 1 (&superstars)
Diggi es ist echt otherworldly mit welcher hingabe du das alles machst. Bin super gespannt auf den Mod und kanns kaum erwarten den zu zocken :)!
Wird ein guter Winter dieses Jahr.
I like the low poly look of 64 games, and low poly games run pretty well on my pc and phone aswell
Edit: i wasn't talking about emulators but they also run well
the n64 was VERY underestimated in terms of 3D rendering
So overdraw tanks performance, while lots of polygons don't. So kinda like the switch then saw that one video once that showed that apprently having polygons for individuals leaves is faster then a cutout shader.
That’s why doom concentrated on reducing overdraw. And DukeNukem3d . And tomb raider also uses portals as does descent.
Well known in 1995, 1996.
That's pretty much why Age of Calamity just kinda dies when the switch is docked. It basically overdraws nearly everything with all the semi-transparent effects and because docked mode uses a higher resolution compared to handheld, it also has to overdraw much more. Rip Switch.
what is happening at 6:56 with that gray platform wall?
I know nothing on game development, but was really impressed with the high poly count you demonstrated. It's inspiring to see people's passions for a +25-year-old game engine and showing us how performance can be optimized. Thank you.
This skill of pushing a computer to it's maximum ability with optimization is dying out. It's just cheaper/ easier for a business to throw more computing power at the problem.
A lost art
Imagine what Super Mario 64(that already is a masterpiece) could have been if Kaze was its developer back in the day
sadly kaze himself admits he can only do this stuff with the knowledge gained since the n64s launch, its more like if kaze was a dev back then and they developed the game for 10 years and it came out in like 2006
Kaze isn't implementing anything that they didn't implement themselves in games later on the consoles timespan.
@@mups4016 yes he is, hes improving even the best engines, hes making better functions then have ever been used before
You know they had time constraints, right?
no amount of time constraints can ever excuse the abomination that is shadow.c - 4 raycasts to find the same surface 4 times, taking up upwards of 20% of your games performance in some areas (or the dynamic collision on the DDD sub)
This man is legendary. It’s like making an airplane that can actually fly with a coal powered steam boiler.
All these advances in N64 rendering has got me wondering how good 2D could look like on the hardware. As far as I’m aware the N64 doesn’t have any dedicated sprite hardware, but given what’s possible on consoles like the PS1 and Saturn I think it could look awesome.
The n64 really bad at sprites, the reason the saturn and ps1 can do it is because they are designed to do it. No amount of optimization changes the n64s limitations.
@@RetroDark_The_Wizard Ok, then explain Bangai-O, Mischief Makers, Ogre Battle, Super Robot Taisen 64, Dr. Mario 64, Puyo Puyo Sun and ~n Party, Yoshi’s Story and various other sprite-based games on the console. The N64 is very capable of doing stuff like this and I’d like to see it taken further.
@@8squared007 A lot of the spritework was limited in animation and in clarity. In mischief makers the sprites are blurry. Same with yoshis story. It’s just that the N64 was designed by a cgi company, and this video is just improving what the n64 was good at, 2d sprites are something it was quite bad at compared to the PS1 and Saturn.
@@RetroDark_The_Wizard Ok, but there are other examples I’ve listed like Bangai-O which prove that it can do great 2D without looking blurry and recent homebrew projects that also serve to help my argument. “Harder to program” does not mean “Worse”, people simply need to do more research.
@@8squared007 It is also harder to program, but again the 2d on Saturn and Ps1 is more fluid and detailed than anything on n64. The n64 had a lot of issues, memory is a big one. I get you like the n64, but it has it limits. I have played a lot of sprite based n64 games and they really weren’t anything special visual wise. It’s impressive, but not that good.
imagine what the N64 could do if Kaze Emanuar and James Lambert worked together on a romhack
Literally everything
N64 game devs had skill issue
I'm so happy to see the N64 modding/homebrew community going strong and getting better and better. ‘Return to Yoshi’s Island’ is the most anticipated mod ever! Can’t wait. You’re a genius!
If Nintendo had any sense whatsoever they'd hire you and pack this game in with an emulator and ship it on their consoles. Or better yet make a mini N64.
I know it sounds cheesy, but other companies frequently hire modders for official games, or buy the mods outright.
"And if you really know what you are doing..."
Well that's it, Nintendo didn't really know what they were doing.
i wonder how these optimisations could increase the performance of other n64 games if implemented
In the series "nier:automate" the chatacter "2B" her ass alone consist of over 300k polygons
If somebody would ever be able to run this model on real N64 hardware it's either you or nobody
so many of those triangles would be denormed to a triangle with 0 sidelength that the n64 would likely only actually render 10k or so of those (unless you zoomed all the way in on the ass)
@@KazeN64 😏 oh well was worth the shot
I gues this to much asked for the N64, still thinking about the idea makes me smile
Perhaps a few years could perhaps do it, or not, time will tell, still thank you for the response
I think the RDRAM is only marginally faster than the Saturns 1MB Low Dram Pool (67-73 MB/s).
one issue with the cut thing
invisible walls
Might be a silly question, but if you're locking your hack to 30fps, why not push the system much further to the point that it would barely stay above 30fps even when unlocked?
That's what I am doing! The earlier levels were not designed with this powerful of an engine in mind so it's often not fully taken advantage of. But the newest levels are designed in a way that the most laggy view often struggles to hit 30.
Says the hyper-realistic yoshi
This video is even good for modern computers. Today even ram and memory is considered slow and writings code to keep braching low is worth it (if you need speed). We even have cryptography functions that are "memory hard" meaning it slows down hackers by making it slower to compute by making it use ram access times in a way that cant be shortcut-ed, slowing down brute force attacks.
In the N64, not branching will use more cycles so the branching part is quite the opposite
How much does adding an Expansion Pak affect performance? DK64 uses it to improve its framerate.
And yeah, the tech behind hardware evolved much faster than the code optimization.
Its a 5% improvement through segmenting at most iirc
You're such a wizard, your journey reverse engineering this game and the console it played on has been so interesting all the way lol
I admit I have no idea what 50% of the stuff you're explaining is yet the way you explain things is so interesting and gets me invested and makes me listen through the entire thing every time. Great video as always!
I love 3d modeling and kinda hate the modern "low poly" artstyle, you know the one with the solid pastel colors and no textures. It felt unique once it first became popular but Ive grown to mostly hate it.
Thats a complete aside though, for n64 mods high poly really fits the artstyle imo, It gives the feeling of "old console with a game way ahead of its time" and I think that really fits the console modding scene!
Edit: when thinking about it, does the memory limitation mean that game engines that can get depth buffers without the normal, "calculate distance and compare" method, like bsp algorithms can get a ordered depth buffer without a typicalZ buffer. I know the quake and doom engines (both of which appeared on the n64) could improve preformance a lot. It also uses the cpu to sort through it so thats another good use of CPU time!
Ps1 low poly is the most aesthetic. Games like mega Man legends, fear Effect 2, and vagrant story.
@@JarlBarbossa agree, but i dont like the broken afgine texture mapping that most games have
@@hughjanes4883 Yeah but that's not relevant to retro inspired modern games, they never incorporate that. Also it doesn't come up with emulation which is how i do everything today.
@@JarlBarbossa i dident know emulators could fix it, i gotta check that out
Kaze is going to get the N64 to turn itself into a quantum computer
😂 Quantum computers are overrated
N64 is where's it at
Wait, so N64 had overdraw on OPAQUE PIXELS? That sounds like a nightmare; in VR dev we have to be very careful to avoid overdraw on transparent pixels, but having to optimize for opaque sounds completely insane lol
On modern hardware you also have an overdraw overhead (also on opaque pixels), but it is hidden by good cache structure and that currently pixel shaders are much more complicated so relatively take more time than memory access. Also techniques like deferred shading might help with complicated shaders at further cost of memory bandwidth.
Overdraw was a huge problem throughout the 90s for all of the consoles. At least with the N64 you had a z-buffer in hardware to fall back on.
I've been playing around with Saturn programming and it blows my mind what techniques 90s programmers were used by Hexen, Sonic R and other games to minimize Overdraw. I can't imagine having to develop this in software when the concept of a camera fulcrum was still relatively new to the industry
@@themeangenein a way “eye of the beholder” avoided overdraw. Wolfenstein3d and Ultima underworld. Even Comanche avoided overdraw (kind of ).
@@qbojjray tracing cores expect you to supply a bounding volume hierarchy. Can the hierarchical z-buffer use it also?
@@themeangene The guys who made Sonic R and co were absolute mad lads. I have never otherwise heard developers consistently custom-creating features on the software side that the console did not ship with and coming back many years later to create an official update for one of their games, which they have to release as a romhack no less due to no longer being partners with the publisher.
1:15 mario almost trips into the edge of reality
I wonder if it would be possible to do something like UE5's nanite on N64 that culls and resizes triangles based on the camera position so you can get the best quality to performance ratio.
i saw a video where someone implemented mipmapped, high quality textures! wild stuff
If you dare to write the code for the RDP. I think homebrew still tackles the CPU only.
Well, that's Nintendo hardware in a nutshell, isn't it?
"Better safe than sorry, even though 'sorry' is extremely unlikely."
Even looking to newer console's, it makes me excited to see your work!
With all this optimization on just he N64, just imagine what levels of optimization are possible for new gen consoles!
Because the bottlenecks do exist, even if they are just so large you can easily fit most games into them. But acknowledgement of the bottleneck having bounds, and thus optimizing your games, opens a world of possibilities.
So with all your engine rewrite + f3dex3 we can say we will get the most out of the N64, I wonder how much will the microcode help
I've not watched the video but I agree with the thumbnail because I've spent so long doing low poly art that when I see it in ps1 style games and stuff I'm not like "wow its like a real retro game" I'm like "wow they really just got 5% of the way of doing a medium poly model and just applied awful oiled up textures and said 'it's just the style we chose'"
I wish I could make my passion my job like you.
Keep up with it! People who care as much as you do are few and far between these days!
I still have a fulltime job additionally to this
@@KazeN64 you're kidding. Damn I wish I had the motivation you do.