"Oh, yeah, sure, the hardware rasterizer chugs on tiny triangles, so we'll just make a GPU software rasterizer, and make it good, and actually a hybrid rasterizer, and abuse the Z-test wave repacking, and analytically compute all the derivatives, easy peasy!" you guys are absolute mad lads. Amazing work!
Totally this… plus everything around it to support and make it actually feasible. The hierarchy tree solutions, the dual graphs to deal with boundaries, memory compression/bandwidth, lights/shadow maps/materials, the matID as depth value is genius (although gets over my head as to how it works for the actual depth test when filling it), etc etc. I honestly understand almost nothing of what’s going on, but I do think this could have broken and not worked at too many critical points to which they did find or invented a solution… jaw dropping.
Brian Karis you can be proud of yourself to what you gave to this industry, many will never realize that the game they're playing can exist only because of you.
I worked at Epic from late 2003 to early 2005, on UT2K4 as an artist. Sometimes late at night, I'd wander over to the Gears of War side of the hallway and discuss things with artists and programmers like James Golding, Shane Caudle, Cliff Bleszinski, Tim Sweeney, and Chris Perna. Having worked closely with programmers from my very first day in game development years earlier, it was a real joy at Epic to talk with people that both I and the gaming world at large considered masters. Still have dreams I'm working there again. The best achievement of my artistic life is my time there, and will never forget what that felt like.
Ah ut2k4 is peak esports and so far nobody has been able to convince me otherwise. Even young people in their early mid 20s think it still looks good today too, so you and your team have done well.
The whole idea and initial work was the guy in this video, Brian Karis. I believe he didn´t got a team to lead till he has proved it worked. He came with the idea, and they told him, go work alone on it for a while a see what happens.
The amount of dots that are connected here and put to use by Nanite is spectacular. UE mesh reduction has been quite a leap but this really takes it to a new level.
@@drumboarder1 If the person you hoped to explain it to couldn't understand this video as well as you can, then there is little hope that you could do it either.
doom eternal or id tech also uses software rasterizer with virtual lod's and shadows, that's why there is not much difference b/w low and ultra, because engine is scaling geometry, shadows etc based on pixels or resolution..
@@niks660097 Its interesting because Doom Eternal is one of the games where you can (on lower end hardware like XBox One) see the LODs swapping and blurring. Other Games usually retain better quality at the cost of lower FPS
@@donsorenoelchapogringo1182 yeah i agree, ID had a presentation on siggraph 2019/21, they are really ahead of their time and probably the only one competing in quality with Unreal 5, pushing 80 million triangles per scene(DF also has a vid about it), using the above LOD and 100s of per pixel lights..
So you're saying you can use a GPU from 15 years ago to run a modern game, if the modern game uses nanite technology? The current recommended specs for the City Sample Demo is: 12-Core CPU @ 3.4 GHz 64 GB RAM GeForce RTX 2080 or AMD Radeon 6000 or higher At least 8GB of RAM Nanite is definitely meant for next gen hardware if it is to be performant Every generation needs some feature that, unless you have the latest and greatest hardware, will render your gameplay unplayable. A while back it was HDR and Shader 3.0. Later it would be VR. During the initial RTX generation, it was ray-tracing. This generation would have to be nanite and lumen.
@@cryora nanite runs on a PS5 so its not really very next-gen in terms of hardware required. Also, the original comment was probably hyperbole, you might wanna learn what it is if you are gonna talk to people
@@Supreme_Lobster I know what I hyperbole is, but it did not strike me as one. You might want to learn how not to be condescending if you are going to talk to people.
This video is also able to help in future people working in Epic/unreal engine, to understand this thechnology, to increase chance to improve it in the future.
There are some truly clever ideas here that make you pause, think, and then see the sense. However, making it all work with all the 'devils in the details', that's on an entirely different level and I consider that an astonishing achievement. Hats off.
I think this sort of technique is way more exciting than changes in hardware, like raytracing cores. You can actually see the improvement without trying to find a reflection or some shadows that you can compare to a badly-implemented rasterization equivalent.
Ray tracing will definitely be a bigger deal in a few years I think. Once more games start using ray traced global illumination, rt ambient occlusion, and similar things, is when you’ll start to see a much bigger difference. Unfortunately there’s only a handful of games that use these right now, mainly dying light 2 and metro exodus pc enhanced, and those games look fantastic with those on.
it's insane that they're not only doing this, but explaining how it's done.... i mean shit. they could take the graphics industry, and the film industry by STORM with this. but instead they show how they've done it, which is pretty amazing, because now other developers from other (especially big) companies will create their own similar methods of doing the same thing instead of paying epic to do it. that's pretty epic.
epic understands the engineering power needed to implement something like this at scale, they have the vehicle for shipping it (unreal engine), and since they've conceptualized it, they have a head start. not only are there few other companies that could fully realize this in an industry-impacting way, most would have no financial incentive to do so. why build entire engineering and r&d departments to enter an implementation race with epic when they can just license the unreal engine? i'm happy to see open discussion like this, but be certain that if what they're sharing is business critical IP, then they're already in the marketing phase. and this is part of that marketing.
They have an engine to sell, and they can't sell it if nobody understands how it works. And these advancements are never made in a vacuum. Epic, as well as almost everyone else in the industry, understands that advancement only happens when knowledge is shared. If nobody shared their knowledge in this industry, we'd still be stuck with '90s/'00s-level graphics.
Yeah, except paying for Unreal is still orders of magnitude cheaper than hiring a highly specialized team to implement something like this. And you can have it right now, not "maybe in a few years".
I wish I was in this team to live this amazing technical adventure! I have been working on a small subset of this (decimation and error estimation part with perceptual hashing and metaheuristics tuning with a DFF NN), it's a lot of fun and very challenging but sadly way too technical for a single person to do like only 1% of what you achieved! Kudos
@@HolidayAtHome You know as well as I do that wouldn't happen, they would just reallocate the space to something else (like how people used to think better tech would mean people would work less and have more leisure time... sigh)
Need better video and texture compression for that. Video and textures are the majority of the size of modern games. 4k 60fps video and 4k-8k textures are very heavy. It would take a revolution in compression, or consumers to want watch movies when they want to watch movies and want to play games when they want to play games so they dont fill games with more cutscene than there is gameplay. 200gb is impressive for 30-40hours of 4k cutscenes, let alone a game coming with it. An hour of 4k movie footage is typically over 20gb compressed, but game devs make all sorts of compression tradeoffs and use screen space effects to cover it up.
@@SoftBreadSoft Having cutscenes that aren't in engine is kinda dumb tbh (yes, I know it's harder, and the quality will be lower... but it's also authentic, and allows for some interactivity)
Amazing work. Personally, I find the limitations of rendering and memory to be a fun challenge that breeds creativity. However, I do see the benefit of virtualized geometry.
Something I don't understand: the original pipeline used a base geometry of prefabulated Amulite, surmounted by a malleable logarithmic Z-buffer in such a way that the two main spurving vertices were in a direct line with the panametric mesh. The latter consisted simply of six hydrocoptic marzlevectors, so fitted to the ambifacient lunar wanecluster that side rasterizing was effectively prevented. The main renderer was of the normal tris-o-deltoid type executed in panendermic semi-boloid threads in the GPU, every seventh LOD being culled by a nonreversible tremmie pointer to the differential girdledraw on the "up" end of the grammesher. As such, shouldn't prefabulated Amulite require less fluorescent score motion than this new Nanite texelencabulator, and therefore decrease sinusoidal depleneration?
I am taking an algorithms class and was thinking "where the hell am i gonna use all these searching and sorting techniques" until I saw this video. Damn. Edit: I'm also taking an operating systems class, and he started talking about concepts such as scheduling, page table, page size, and amortization. This is beyond pog.
it's funny because this is pretty much what Euclideon was talking about for over a decade. it's also what John Carmack spoke about after Rage's release. John Carmack didn't have enough time at ID to see it through, and everyone said Euclideon was faking it
Still can't get past the issue with derivatives. You can't just hand-wave away the case where analytical derivatives don't work - in many cases they dont, such as always when a texture lookup influences a UV. As soon as you go fancy with your materials (refraction like in clearcoat or raindrops on a surface, heat shimmer etc.) you have a ton of indiscriminate UVs and that's gonna look messy. Also, what does the analytical case mean in practice? Will you have to go over every shader that uses a ddx/ddy manually and replace that with an analytical derivative?
Even if software rasterization is faster doesnt that mean that you are using up compute shaders that otherwise would be used for other tasks while the hw rasterizer cant be used for anything else?
I was hesitant of posting a similar comment out of backlash, but yes, I would be curious too to see this having a repurposed branch to maximize all the unified memory, huge CPU/GPU bandwidths, etc advancements. For my use case, it would be great to see Apple and Epic make its peace so that we get this insane tech on most platforms as possible… else when the time comes I’ll end up going back to the PC realm, and while I got nothing specific against it, I would rather not. This goes way WAY over my head, but it is my (naive) understanding that the Metal API and it’s tiled based rendering approach, arguments buffer rendering, etc would actually be beneficial for what these devs are doing?
The way you are describing things up to around 16:10 you make it sound like someone with decent coding expertise could implement the important bits of this idea themselves
Wow. You are literally pushing the boundaries of what a GPU can actually do, things such as software rendering on a GPU seem so cursed to me, and even more cursed is drawing tiny triangles, but you actually made it work! Also, I love the irony of Advances in Real-Time Rendering having a website without CSS. lol
I'm really struggling to understand the code at 35:35 It seems simple at a glance but it doesn't explain what the edge values actually are. And what are the three CX/CY's? But mainly, how does taking the min of those tell you if the point is within the triangle? I feel close to getting it but not quite. I cant find anything online with a similar method for detecting if a point is inside a triangle.
I'm guessing they have broken the barycentric test into several stages, where certain stages can be calculated and used for a whole batch of pixels at once in a data orientated fashion or something along those lines, don't quote me on that tho, its been a long time since i used barycentric coordinates and even then i didnt completey understand them :D
@@pewpewlasergunz I did eventually figure it out after an undetermined amount of time sitting and staring at it. I don't think it uses barycentrics actually. At least not in a way that I recognize, though admittedly I don't really understand them that well either. All it requires is some basic vector math and 3 cross products, each of which test which side the point falls on for each edge of the triangle. That, I can make sense of for each individual point test, but I swear there's some wizardry going on as once you calculate it for the top left pixel of the bounds you can just add the opposite axis of the edge vectors to the cross product values to somehow scan them across the bounds without recalculating any cross products or anything, only that small bit of addition for each pixel. I don't entirely understand it but it does make some intuitive sense vaguely if I don't try to think too hard on it. But I couldn't find anything online doing this method or anything, so Idunno where it comes from or how anyone would figure it out, it seems pretty clever to my non math oriented brain. I managed to completely implement a GPU based software rasterizer this way, though the results aren't the best. I couldn't get the second scanline method demonstrated to work at all, I don't understand the math there as math isn't my strong suit. It doesn't help that it uses ternaries on bool3's which I feel the behavior is undefined for. I fiddled with everything I could but to no luck, best i could get was getting it to fill the entire rect lmao. I'm not smart enough to figure how that algorithm is supposed to work. Other than that, I couldn't beat the hardware rasterizer no matter what when rendering 10M tris. Though performance seemed to be the same no matter what size the tris were as long as they were small, and the same between the scanline and fill algorithms, working or not. So there must be a bottleneck elsewhere that I'm too lazy to figure out. I don't really know an easy way to debug compute shaders in Unity so it's tricky. I assume it has something to do with my data structure, since they didn't cover that really I just had to figure out my own way of formatting the data on the GPU and I'm not super knowledgeable in how to optimize that. Also I'm not sure if there's a way supposed to be able to dispatch that many triangles in one go. Because with 128 tris per thread group, and a max of 65535 groups able to be dispatched at once, can only process about 8,388,480 tris (if perfect clusters) and then more dispatches have to be called if there's more tris. Or can loop in the shader itself but I didn't notice any significant performance difference for either of those. I just feel like I'm doing many things wrong somehow without really knowing how to fix it. 😅 But eh, it probably isn't a good idea to replace my rendering with GPU software rasterization anyway lmao. It would be nice though because it would allow me to get rid of the geometry shader stage and improve platform compatibility. And technically if actually rendering speed were on par, I could probably implement better culling. You know how it is though. Rant over.
Pretty cool explanations. I didn't really understand most of the technical details and jargon but the overall concept and parts of the executen were quite clear. 👍 1:04:02 I understood the slide about vertex clustering and duplicates completely 😁 which was cool 1:08:14 one of the things I asked myself is how you handle translucent things, since you said you didn't use the materials in looking for the right triangles. So I guess you somehow still have to implement this somehow
"Only falls apart under extreme visibility changes" So any first person shooter that's not on consoles lol. This method should work well for a controller with a max yaw and pitch rate. So either we would have weird graphics glitches or performance would suffer and FPS would drop in extreme viewport changes?
i think performance would be what suffers, but unless you were already pushing your system to the max, it probably wouldn't cause noticeable framedrops. It just means that the rendering occlusion briefly becomes less efficient than if it was using single-pass occlusion
i know he obviously didn't write/invent all of this and there's a massive team at unreal that got this done, but the fact that one person can even *understand* the number of moving parts in this system is insane. watching this video could singlehandedly kill a software engineer from the 80s.
8:35 "Draw everything that's new" - How do you determine whats new without checking all the geometry? (or at least all of the triangle groups). If you're checking all geometry for visibility every frame anyway, even if you're using acceleration structures, how are you any better than just checking all the geometry every frame without taking the hits from the last frame?
Think about if you were rotating an object on screen clockwise. Most of the triangles are good where they are, but as you rotate you’re occluding some triangles on the right by other triangles rendered last frame and you’re showing other new triangles that were previously occluded by the triangles that were rendered last frame. The z-buffer comparison lets the engine draw boxes on the left side and the right side of that rotating object so that it knows to only update those triangles based on the new z-buffer, while everything else can stay the same. As to why this is faster, it’s because every time you replace a triangle in the buffer you have to do a ton of initial calculations on that triangle, which takes time, so minimizing the amount of triangles to replace is always going to be better for time, especially if you can do it efficiently with the gpu (like what they are doing here!) Edited for brevity
I'm a noob to this but it seems the goal is not to minimize checks against the hi-z (they are cheap enough) but to get a useful, but incomplete hi-z that can be used to occ-cull the rest of the stuff to draw. You don't need to make a full hi-z after that as it's not being used by the next frame. Though you might want to if you have other stages of your rendering that would find it useful. Other techniques which use re-projection use the last hi-z and so *could* have a better incomplete hi-z to start from. But as they aren't conservative so you risk incorrect culls. It seems that being precise (conservative) still gave enough of a win without risking incorrect culling.
@Night Fox It works like that: 1. Draw everything from the last frame *but with updated transformation*. This should be the bulk of your geometry. 2. Occlusion-test everything that wasn't drawn last frame (but is inside the frustum) against the z-buffer you just created (this is fast if you use bounding geometry instead of meshes) 3. Draw all the "new" stuff that wasn't occluded This means you still do have to do a lot of occlusion checks but it's nevertheless a lot faster than throwing *everything* at the pipeline and let the z-buffer sort it out.
I'm STILL struggling very hard to understand HOW this did NOT revolutionize every 3D suite out there. It's not as if they kept the tech completely hidden.
Implementation complexity. It's too new to be that wide-spread. Doesn't even cover all cases yet. Eventually this may grow into a proper standard of doing things. Give it a few more years
I thank all the magicians at epic for this feature. It's like aliens gave us a tool, that we didn't even understand but was better than everything that currently exists.
The irony is the huge RAM memory space was attribut to normal map and color texture not 3D assets, a polygonal object is like a 2D vector file it's scalable at any pixel resolution and should'nt take more ram and hard drive space than a raster texture, Voxel can also be usable to add detail without exploding the CPU and GPU performance time to found hybrid way or new tech ...
So what if there's no good previous frame to work from, like after a hard transition? Is there a big slow down, or do we get a single poor quality frame?
Harmonious Layers Of Ordered And Interconnected And Interoperable Complexity Which Perform Specific Tasks In Optimal Functional Unity To Achieve Maximized Efficiency And Adaptability.
Know any good vulgarisation of this? I don't know any of the technical words used to talk about pre-existing systems, I just wanted to know about this specific method.
Had a question, only one of the very few things that I managed to barely get a shallow understanding: when filling the material RT buffer by using a depth value as the material ID (genius concept), how is the depth test done there in that case? If a triangle is forced to have a depth value (for the material ID) then it wouldn’t play ball with the current depth test right? I’m missing something for sure here, or it isn’t a hardware depth test or it was already done and this is another pass?
The pacing/flow makes this presentation a bit hard to follow, like just reading a script in a kind of stilted way is harder for me to process than just reading it myself
I worry about file sizes though, games are already ridiculously large, I can't believe I downloaded 150gbs for warzone.. if it used this technology it would be potentially infinite in size
Yeah, no. warzone is massive because Activision deliberately adds bloat every COD release to make it look "bigger" than the one last year. I think warzone has raw audio files for every localisation of the game which is most of the bloat
@@Xyzair The GPU is supposed to fetch data it needs for the next frame *from the internet*? Your game now runs better depending on internet speed? People with slow internet get ugly LOD pops? That's a terrible use case for the cloud.
Pretty cool, but how do you want to ship so much detail? Filesize would explode if you use so many high detailed meshes. I guess this will lead to a heavier re-use of modular assets? All in all pretty kick-ass tech!!
I think the idea is that its not about having every game have every single asset to a pixel sized detail scale, but to be applied to certain important objects such as characters. Or just to have the possibility if necessary for a given scene. Finally it increases the overlap between real time rendering and traditional vfx work flows, getting us closer to better virtual production.
the intent is to have a general system that "just works" no matter what geometry you throw at it, without you having to do manual optimization of your geometry or other extra work. how and whether developers decide to utilise it for very high quality geometry is an entirely independent choice
There is often some truth in jokes, and if there is a joke "instances are the new triangles", it might just be something to consider. What if the tech could go full circle, so clusters of instances could be used to build volumetric cloud data? After all, a dust cloud is a huge amounts of small grains of sand. What if you could build the engine, so you could see a cloud of sand from the distance, zoom in, and eventually go in and see the fine details of an individual grain of sand. Perhaps this could be a way to deal with foliage? Leaves that get close are individual leaves, but all of the distant leaves are treated as volumetric clouds somehow.
applying this kind of technique to instances rather than geometry would be much more difficult, because of how unique every instance is - creating a "one size fits all" system would be an incredibly difficult task, and nanite already took a huge amount of work to just handle geometry, which is much less unique - once it gets to nanite and rendering, it's all just a load of triangle data
referencing a huge mesh database so you can view it up close sounds to me like you beat euclideon unlimited detail at their own game. epic made it work well but this concept was around a decade ago.
Sorta but not really. The whole Euclideon thing uses such wildly different representations of geometry (pointclouds, or maybe signed distance fields, I forget), and really never worked well for anything other than the geometry itself.
you could have done smaller cuts in order to read more fluetly, because the micro stops and re-pronunciation after errors and change of intonation are very distracting. thanks for your time
What study path should I follow to understand this? I'm a programmer with game development experience. I would love to be able to manipulate the rendering pipeline.
this went from "oh yeah, makes sense" to " I have no idea what he is talking about" really quick
My existence with my son, when he sends me links to videos like this thinking I can follow the deeper concepts 😂
@@JasonLiske At least you show interest in his work, unlike my parents who don't even know what I do for work.
glad i am not alone felt that
"Oh, yeah, sure, the hardware rasterizer chugs on tiny triangles, so we'll just make a GPU software rasterizer, and make it good, and actually a hybrid rasterizer, and abuse the Z-test wave repacking, and analytically compute all the derivatives, easy peasy!" you guys are absolute mad lads. Amazing work!
Totally this… plus everything around it to support and make it actually feasible.
The hierarchy tree solutions, the dual graphs to deal with boundaries, memory compression/bandwidth, lights/shadow maps/materials, the matID as depth value is genius (although gets over my head as to how it works for the actual depth test when filling it), etc etc.
I honestly understand almost nothing of what’s going on, but I do think this could have broken and not worked at too many critical points to which they did find or invented a solution… jaw dropping.
@@alejmc Yeah, I had to omit like 3/4 of the super impressive parts just to make a concise post. What a time to be alive!
@@jjoonathan7178 thanks karoly
@@jeshweedleon3960 I'd like to see Karoly fit this one into two minutes. A speedrun for the ages.
@@jjoonathan7178 fellow scholars i see
The research and development at epic games is absolutely insane. This is the bleeding edge and it's amazing what they've created
Yep, and the only part I understand about any of it is that it's awesome.
The amount of groundbreaking innovation in this work is astonishing.
Brian Karis you can be proud of yourself to what you gave to this industry, many will never realize that the game they're playing can exist only because of you.
Thank you for having the dream.. the dedication and opportunity to work on it. You and your team deserve the best
I worked at Epic from late 2003 to early 2005, on UT2K4 as an artist. Sometimes late at night, I'd wander over to the Gears of War side of the hallway and discuss things with artists and programmers like James Golding, Shane Caudle, Cliff Bleszinski, Tim Sweeney, and Chris Perna. Having worked closely with programmers from my very first day in game development years earlier, it was a real joy at Epic to talk with people that both I and the gaming world at large considered masters. Still have dreams I'm working there again. The best achievement of my artistic life is my time there, and will never forget what that felt like.
I don't understand. If working there is so mind-blowingly amazing, why are you not working there anymore?
@@DasAntiNaziBroetchen could be many reasons, for example could have had to leave the country/state
Ah ut2k4 is peak esports and so far nobody has been able to convince me otherwise. Even young people in their early mid 20s think it still looks good today too, so you and your team have done well.
Wow, just wow! Congrats to everyone involved! And thanks for sharing the gist of how things are done!
The team behind this are true geniuses. This amount of complexity being managed is pretty incredible.
The whole idea and initial work was the guy in this video, Brian Karis. I believe he didn´t got a team to lead till he has proved it worked. He came with the idea, and they told him, go work alone on it for a while a see what happens.
The amount of dots that are connected here and put to use by Nanite is spectacular. UE mesh reduction has been quite a leap but this really takes it to a new level.
This sounds like the technical jargon overload to me half the time, but I feel I could follow along enough to get the raw basics. Very Cool.
Knowing just enough to follow along but not being able to re explain any of it to anyone is frustrating as cuuuuuuuunt
@@drumboarder1 If the person you hoped to explain it to couldn't understand this video as well as you can, then there is little hope that you could do it either.
@@mnomadvfx fair point
I had no idea nanite was using a GPU software rasterizer, that's nuts.
doom eternal or id tech also uses software rasterizer with virtual lod's and shadows, that's why there is not much difference b/w low and ultra, because engine is scaling geometry, shadows etc based on pixels or resolution..
@@niks660097 Its interesting because Doom Eternal is one of the games where you can (on lower end hardware like XBox One) see the LODs swapping and blurring. Other Games usually retain better quality at the cost of lower FPS
@@donsorenoelchapogringo1182 yeah i agree, ID had a presentation on siggraph 2019/21, they are really ahead of their time and probably the only one competing in quality with Unreal 5, pushing 80 million triangles per scene(DF also has a vid about it), using the above LOD and 100s of per pixel lights..
@@niks660097 can you please link the id presentation I can't seem to find it
@@niks660097 As per the other person, would really appreciate a link to that, as I also cannot find it (maybe using the wrong keywords? Idk)
This change in rendering pipeline is more important than any GPU hardware advances in the last 15 years. Amazing!
This proves my long-held perception that programming is art, since all creativity comes from an inspired mind seeing beyond the conventional.
So you're saying you can use a GPU from 15 years ago to run a modern game, if the modern game uses nanite technology? The current recommended specs for the City Sample Demo is:
12-Core CPU @ 3.4 GHz
64 GB RAM
GeForce RTX 2080 or AMD Radeon 6000 or higher
At least 8GB of RAM
Nanite is definitely meant for next gen hardware if it is to be performant
Every generation needs some feature that, unless you have the latest and greatest hardware, will render your gameplay unplayable. A while back it was HDR and Shader 3.0. Later it would be VR. During the initial RTX generation, it was ray-tracing. This generation would have to be nanite and lumen.
@@cryora nanite runs on a PS5 so its not really very next-gen in terms of hardware required. Also, the original comment was probably hyperbole, you might wanna learn what it is if you are gonna talk to people
@@Supreme_Lobster I know what I hyperbole is, but it did not strike me as one. You might want to learn how not to be condescending if you are going to talk to people.
But the first paper with a working prototype was published on 2003, what about that?
... I member when I could understand (albeit marginally) the tech behind UE4, this is on another level of complexity. Amazing stuff for sure.
I don't membaaaaaaaaa, I forgot.
This video is also able to help in future people working in Epic/unreal engine, to understand this thechnology, to increase chance to improve it in the future.
You guys make it look easy - truly incredible stuff. I didn't quite believe the tech demos until I had a play with it myself. Great job!
There are some truly clever ideas here that make you pause, think, and then see the sense. However, making it all work with all the 'devils in the details', that's on an entirely different level and I consider that an astonishing achievement. Hats off.
This is mindblowing! Couldn't wrap my head around the deep technical parts of it, but I did understand the general ideal and process for Nanite
Wow the references brought a lot of nostalgia. Was waist deep in this thing late 2000s/early 2010s. Wonder how mega geometry went.
This is some crazy piece of engineering.
i read the slides when they came out, now i finally got to see the presentation amazing tech no doubt
I think this sort of technique is way more exciting than changes in hardware, like raytracing cores. You can actually see the improvement without trying to find a reflection or some shadows that you can compare to a badly-implemented rasterization equivalent.
Ray tracing will definitely be a bigger deal in a few years I think. Once more games start using ray traced global illumination, rt ambient occlusion, and similar things, is when you’ll start to see a much bigger difference. Unfortunately there’s only a handful of games that use these right now, mainly dying light 2 and metro exodus pc enhanced, and those games look fantastic with those on.
@@primohippo4014 Then again you can make any game look fantastic with nanite. Raytracing takes extra work and hardware.
This is fantastic. I hope in the future we get a hardware implementation of this kind of thing.
That'll be great. Probably will be implemented in the GPU, like ray tracing.
Oh man, that'd be fantastic
Mesh shader (by NVIDIA) is similar.
Amazing, I didn't understand a word, but amazing :)
it's insane that they're not only doing this, but explaining how it's done.... i mean shit. they could take the graphics industry, and the film industry by STORM with this.
but instead they show how they've done it, which is pretty amazing, because now other developers from other (especially big) companies will create their own similar methods of doing the same thing instead of paying epic to do it. that's pretty epic.
epic understands the engineering power needed to implement something like this at scale, they have the vehicle for shipping it (unreal engine), and since they've conceptualized it, they have a head start.
not only are there few other companies that could fully realize this in an industry-impacting way, most would have no financial incentive to do so. why build entire engineering and r&d departments to enter an implementation race with epic when they can just license the unreal engine?
i'm happy to see open discussion like this, but be certain that if what they're sharing is business critical IP, then they're already in the marketing phase. and this is part of that marketing.
You know the unreal engine is open source right?
They have an engine to sell, and they can't sell it if nobody understands how it works.
And these advancements are never made in a vacuum. Epic, as well as almost everyone else in the industry, understands that advancement only happens when knowledge is shared. If nobody shared their knowledge in this industry, we'd still be stuck with '90s/'00s-level graphics.
@@Drollfilms its not open source
Yeah, except paying for Unreal is still orders of magnitude cheaper than hiring a highly specialized team to implement something like this. And you can have it right now, not "maybe in a few years".
I wish I was in this team to live this amazing technical adventure! I have been working on a small subset of this (decimation and error estimation part with perceptual hashing and metaheuristics tuning with a DFF NN), it's a lot of fun and very challenging but sadly way too technical for a single person to do like only 1% of what you achieved! Kudos
I love this technology but don't tell Activision about it or next COD will have 2TB.
Games could actually become smaller because you don't need to put all the LODs in the game!
@@HolidayAtHome You know as well as I do that wouldn't happen, they would just reallocate the space to something else (like how people used to think better tech would mean people would work less and have more leisure time... sigh)
Need better video and texture compression for that. Video and textures are the majority of the size of modern games. 4k 60fps video and 4k-8k textures are very heavy. It would take a revolution in compression, or consumers to want watch movies when they want to watch movies and want to play games when they want to play games so they dont fill games with more cutscene than there is gameplay. 200gb is impressive for 30-40hours of 4k cutscenes, let alone a game coming with it. An hour of 4k movie footage is typically over 20gb compressed, but game devs make all sorts of compression tradeoffs and use screen space effects to cover it up.
@@SoftBreadSoft Having cutscenes that aren't in engine is kinda dumb tbh (yes, I know it's harder, and the quality will be lower... but it's also authentic, and allows for some interactivity)
This is a game(graphics) programming gem. Thank you for sharing.
Amazing work. Personally, I find the limitations of rendering and memory to be a fun challenge that breeds creativity. However, I do see the benefit of virtualized geometry.
This is just crazy stuff...really would have loved to be in gfx as an engineer.
Nuts. Absolutely nuts. I had no idea Nanite was so freaking complicated!
Something I don't understand: the original pipeline used a base geometry of prefabulated Amulite, surmounted by a malleable logarithmic Z-buffer in such a way that the two main spurving vertices were in a direct line with the panametric mesh. The latter consisted simply of six hydrocoptic marzlevectors, so fitted to the ambifacient lunar wanecluster that side rasterizing was effectively prevented. The main renderer was of the normal tris-o-deltoid type executed in panendermic semi-boloid threads in the GPU, every seventh LOD being culled by a nonreversible tremmie pointer to the differential girdledraw on the "up" end of the grammesher. As such, shouldn't prefabulated Amulite require less fluorescent score motion than this new Nanite texelencabulator, and therefore decrease sinusoidal depleneration?
u alright mate?
Putting that PHD in Applied Phlebotinum to good use.
Thank you for this
Yes...
Oh shit is this what normal people hear when they watch these presentations?
Congrats! This video, single handedly, violates every single rule of making good public presentation.
"You can't displace a sphere and turn it into a torus..."
CGMatter: "Hold my nodes..."
Great presentation. Fascinating and really cool. Little bit sad my knowledge is obsolete but it's a little less so after watching this.
The depth of your knowledge is impressive. Very interesting. thank you.
How many people and time was invested to develop all this?
i think nanite development started 10 years ago
@@JorgetePanete Yes
Thank you, couldn't find any info on how it worked when it came out a couple months ago
That's insane. Very cool. Repurposing the z-buffer must have raised a few eyebrows.
Beautiful piece of tech/code!
I am taking an algorithms class and was thinking "where the hell am i gonna use all these searching and sorting techniques" until I saw this video. Damn.
Edit: I'm also taking an operating systems class, and he started talking about concepts such as scheduling, page table, page size, and amortization. This is beyond pog.
it's funny because this is pretty much what Euclideon was talking about for over a decade. it's also what John Carmack spoke about after Rage's release. John Carmack didn't have enough time at ID to see it through, and everyone said Euclideon was faking it
Engineering at it's finest!
Still can't get past the issue with derivatives. You can't just hand-wave away the case where analytical derivatives don't work - in many cases they dont, such as always when a texture lookup influences a UV. As soon as you go fancy with your materials (refraction like in clearcoat or raindrops on a surface, heat shimmer etc.) you have a ton of indiscriminate UVs and that's gonna look messy.
Also, what does the analytical case mean in practice? Will you have to go over every shader that uses a ddx/ddy manually and replace that with an analytical derivative?
Even if software rasterization is faster doesnt that mean that you are using up compute shaders that otherwise would be used for other tasks while the hw rasterizer cant be used for anything else?
Really would love to see how this would fly on an ARM chip with 128 GB of unified memory..
I was hesitant of posting a similar comment out of backlash, but yes, I would be curious too to see this having a repurposed branch to maximize all the unified memory, huge CPU/GPU bandwidths, etc advancements.
For my use case, it would be great to see Apple and Epic make its peace so that we get this insane tech on most platforms as possible… else when the time comes I’ll end up going back to the PC realm, and while I got nothing specific against it, I would rather not.
This goes way WAY over my head, but it is my (naive) understanding that the Metal API and it’s tiled based rendering approach, arguments buffer rendering, etc would actually be beneficial for what these devs are doing?
The way you are describing things up to around 16:10 you make it sound like someone with decent coding expertise could implement the important bits of this idea themselves
Imagine this software rasterizer makes it into a hardware rasterizer 😳
Noooow I know how they built the simulator we're living in now, called UE5, Universe Engine 5.....
@@High.on.Life_DnB Omg, it all makes SeNsE!!!!! xD
"Instances are the new triangles". My favorite line.
The very principles of Mandalas - Ancient Indian Art - why am I not surprised to see temple graphics in this video :-) ? Good explanation.
Literally 'Game Changing'!!
you can only learn this stuff if you love this stuff
This is exceptional work
Wow. You are literally pushing the boundaries of what a GPU can actually do, things such as software rendering on a GPU seem so cursed to me, and even more cursed is drawing tiny triangles, but you actually made it work!
Also, I love the irony of Advances in Real-Time Rendering having a website without CSS. lol
I'm really struggling to understand the code at 35:35
It seems simple at a glance but it doesn't explain what the edge values actually are.
And what are the three CX/CY's?
But mainly, how does taking the min of those tell you if the point is within the triangle?
I feel close to getting it but not quite.
I cant find anything online with a similar method for detecting if a point is inside a triangle.
I'm guessing they have broken the barycentric test into several stages, where certain stages can be calculated and used for a whole batch of pixels at once in a data orientated fashion or something along those lines, don't quote me on that tho, its been a long time since i used barycentric coordinates and even then i didnt completey understand them :D
@@pewpewlasergunz I did eventually figure it out after an undetermined amount of time sitting and staring at it. I don't think it uses barycentrics actually. At least not in a way that I recognize, though admittedly I don't really understand them that well either.
All it requires is some basic vector math and 3 cross products, each of which test which side the point falls on for each edge of the triangle. That, I can make sense of for each individual point test, but I swear there's some wizardry going on as once you calculate it for the top left pixel of the bounds you can just add the opposite axis of the edge vectors to the cross product values to somehow scan them across the bounds without recalculating any cross products or anything, only that small bit of addition for each pixel.
I don't entirely understand it but it does make some intuitive sense vaguely if I don't try to think too hard on it. But I couldn't find anything online doing this method or anything, so Idunno where it comes from or how anyone would figure it out, it seems pretty clever to my non math oriented brain.
I managed to completely implement a GPU based software rasterizer this way, though the results aren't the best. I couldn't get the second scanline method demonstrated to work at all, I don't understand the math there as math isn't my strong suit. It doesn't help that it uses ternaries on bool3's which I feel the behavior is undefined for. I fiddled with everything I could but to no luck, best i could get was getting it to fill the entire rect lmao. I'm not smart enough to figure how that algorithm is supposed to work.
Other than that, I couldn't beat the hardware rasterizer no matter what when rendering 10M tris. Though performance seemed to be the same no matter what size the tris were as long as they were small, and the same between the scanline and fill algorithms, working or not. So there must be a bottleneck elsewhere that I'm too lazy to figure out. I don't really know an easy way to debug compute shaders in Unity so it's tricky. I assume it has something to do with my data structure, since they didn't cover that really I just had to figure out my own way of formatting the data on the GPU and I'm not super knowledgeable in how to optimize that. Also I'm not sure if there's a way supposed to be able to dispatch that many triangles in one go. Because with 128 tris per thread group, and a max of 65535 groups able to be dispatched at once, can only process about 8,388,480 tris (if perfect clusters) and then more dispatches have to be called if there's more tris. Or can loop in the shader itself but I didn't notice any significant performance difference for either of those.
I just feel like I'm doing many things wrong somehow without really knowing how to fix it. 😅
But eh, it probably isn't a good idea to replace my rendering with GPU software rasterization anyway lmao. It would be nice though because it would allow me to get rid of the geometry shader stage and improve platform compatibility. And technically if actually rendering speed were on par, I could probably implement better culling. You know how it is though.
Rant over.
As an artist, kudos to you!
Pretty cool explanations. I didn't really understand most of the technical details and jargon but the overall concept and parts of the executen were quite clear. 👍
1:04:02 I understood the slide about vertex clustering and duplicates completely 😁 which was cool
1:08:14 one of the things I asked myself is how you handle translucent things, since you said you didn't use the materials in looking for the right triangles. So I guess you somehow still have to implement this somehow
About your second point, I believe Nanite doesn't support transparent materials.
Great job, very technical and well explained.
"Only falls apart under extreme visibility changes"
So any first person shooter that's not on consoles lol.
This method should work well for a controller with a max yaw and pitch rate.
So either we would have weird graphics glitches or performance would suffer and FPS would drop in extreme viewport changes?
I dunno, maybe it only affects Z-culling, not simply stuff being behind the camera.
i think performance would be what suffers, but unless you were already pushing your system to the max, it probably wouldn't cause noticeable framedrops. It just means that the rendering occlusion briefly becomes less efficient than if it was using single-pass occlusion
i know he obviously didn't write/invent all of this and there's a massive team at unreal that got this done, but the fact that one person can even *understand* the number of moving parts in this system is insane. watching this video could singlehandedly kill a software engineer from the 80s.
any idea why nanite meshes do not update the nav mesh? not even after setting the nav mesh dynamic, the nav mesh simply ignores nanite meshes
Now the question is : what will GPU designers do to help improve Nanite.
This is absolutely bonkers. Amazing.
8:35 "Draw everything that's new" - How do you determine whats new without checking all the geometry? (or at least all of the triangle groups). If you're checking all geometry for visibility every frame anyway, even if you're using acceleration structures, how are you any better than just checking all the geometry every frame without taking the hits from the last frame?
Think about if you were rotating an object on screen clockwise. Most of the triangles are good where they are, but as you rotate you’re occluding some triangles on the right by other triangles rendered last frame and you’re showing other new triangles that were previously occluded by the triangles that were rendered last frame. The z-buffer comparison lets the engine draw boxes on the left side and the right side of that rotating object so that it knows to only update those triangles based on the new z-buffer, while everything else can stay the same.
As to why this is faster, it’s because every time you replace a triangle in the buffer you have to do a ton of initial calculations on that triangle, which takes time, so minimizing the amount of triangles to replace is always going to be better for time, especially if you can do it efficiently with the gpu (like what they are doing here!)
Edited for brevity
I'm a noob to this but it seems the goal is not to minimize checks against the hi-z (they are cheap enough) but to get a useful, but incomplete hi-z that can be used to occ-cull the rest of the stuff to draw.
You don't need to make a full hi-z after that as it's not being used by the next frame. Though you might want to if you have other stages of your rendering that would find it useful.
Other techniques which use re-projection use the last hi-z and so *could* have a better incomplete hi-z to start from. But as they aren't conservative so you risk incorrect culls. It seems that being precise (conservative) still gave enough of a win without risking incorrect culling.
@Night Fox
It works like that:
1. Draw everything from the last frame *but with updated transformation*. This should be the bulk of your geometry.
2. Occlusion-test everything that wasn't drawn last frame (but is inside the frustum) against the z-buffer you just created (this is fast if you use bounding geometry instead of meshes)
3. Draw all the "new" stuff that wasn't occluded
This means you still do have to do a lot of occlusion checks but it's nevertheless a lot faster than throwing *everything* at the pipeline and let the z-buffer sort it out.
I'm STILL struggling very hard to understand HOW this did NOT revolutionize every 3D suite out there. It's not as if they kept the tech completely hidden.
Implementation complexity. It's too new to be that wide-spread. Doesn't even cover all cases yet. Eventually this may grow into a proper standard of doing things. Give it a few more years
this is going to be one of the tech that i wish was in the next elder scrolls game. just like i wanted tessellation to be in skyrim back in the day.
I thank all the magicians at epic for this feature. It's like aliens gave us a tool, that we didn't even understand but was better than everything that currently exists.
nah the guys who made it obviously understand it how would it be a coherent and relatively bug free feature otherwise?
I don't know about everyone, But I love UV mapping and Skinning :D
I didn't understand anything you said but the pictures where nice.
I miss good old days when I thought shadowmapping is hard to understand
Can't understand jack sheat but I like the result of all this thinking!
Incredible
'good work for sure.
I have no words... Wow!
The irony is the huge RAM memory space was attribut to normal map and color texture not 3D assets, a polygonal object is like a 2D vector file it's scalable at any pixel resolution and should'nt take more ram and hard drive space than a raster texture, Voxel can also be usable to add detail without exploding the CPU and GPU performance time to found hybrid way or new tech ...
Based that he explained it fully. Ty
So what if there's no good previous frame to work from, like after a hard transition? Is there a big slow down, or do we get a single poor quality frame?
You did a great job presenting! Very clear and easy to understand. Thank you
Harmonious Layers Of Ordered And Interconnected And Interoperable Complexity Which Perform Specific Tasks In Optimal Functional Unity To Achieve Maximized Efficiency And Adaptability.
Know any good vulgarisation of this? I don't know any of the technical words used to talk about pre-existing systems, I just wanted to know about this specific method.
'Also known as an index buffer and a triangle mesh'.. Lol
Had a question, only one of the very few things that I managed to barely get a shallow understanding: when filling the material RT buffer by using a depth value as the material ID (genius concept), how is the depth test done there in that case? If a triangle is forced to have a depth value (for the material ID) then it wouldn’t play ball with the current depth test right? I’m missing something for sure here, or it isn’t a hardware depth test or it was already done and this is another pass?
The depth is encoded in the higher bits, so it still works as usual except you can now trivially filter pixels for material.
The pacing/flow makes this presentation a bit hard to follow, like just reading a script in a kind of stilted way is harder for me to process than just reading it myself
This kind of culling is already done in game engine like CSGO/cs2's hammer engine I think 🤔
I worry about file sizes though, games are already ridiculously large, I can't believe I downloaded 150gbs for warzone.. if it used this technology it would be potentially infinite in size
Sounds like someone found a use case for the cloud~
Yeah, no. warzone is massive because Activision deliberately adds bloat every COD release to make it look "bigger" than the one last year.
I think warzone has raw audio files for every localisation of the game which is most of the bloat
@@Xyzair The GPU is supposed to fetch data it needs for the next frame *from the internet*? Your game now runs better depending on internet speed? People with slow internet get ugly LOD pops? That's a terrible use case for the cloud.
@@NoName-zr7rz GTA 5, RDR2, etc.. There are quite a few games that big..
@@NoName-zr7rz Downloading each language file might not be very smart but having raw audio actually can lead to better performance.
So Nice, Thank You
Pretty cool, but how do you want to ship so much detail? Filesize would explode if you use so many high detailed meshes. I guess this will lead to a heavier re-use of modular assets?
All in all pretty kick-ass tech!!
I think the idea is that its not about having every game have every single asset to a pixel sized detail scale, but to be applied to certain important objects such as characters. Or just to have the possibility if necessary for a given scene. Finally it increases the overlap between real time rendering and traditional vfx work flows, getting us closer to better virtual production.
the intent is to have a general system that "just works" no matter what geometry you throw at it, without you having to do manual optimization of your geometry or other extra work. how and whether developers decide to utilise it for very high quality geometry is an entirely independent choice
where can i get the pitch deck?
Good technology info, but the speaker reads (poorly) from a txt document and that's it :/
There is often some truth in jokes, and if there is a joke "instances are the new triangles", it might just be something to consider. What if the tech could go full circle, so clusters of instances could be used to build volumetric cloud data? After all, a dust cloud is a huge amounts of small grains of sand. What if you could build the engine, so you could see a cloud of sand from the distance, zoom in, and eventually go in and see the fine details of an individual grain of sand. Perhaps this could be a way to deal with foliage? Leaves that get close are individual leaves, but all of the distant leaves are treated as volumetric clouds somehow.
applying this kind of technique to instances rather than geometry would be much more difficult, because of how unique every instance is - creating a "one size fits all" system would be an incredibly difficult task, and nanite already took a huge amount of work to just handle geometry, which is much less unique - once it gets to nanite and rendering, it's all just a load of triangle data
Absolutely amazing!
Thanks much appreciated.
referencing a huge mesh database so you can view it up close sounds to me like you beat euclideon unlimited detail at their own game. epic made it work well but this concept was around a decade ago.
well, they started working in nanite 10 years ago or so
Sorta but not really. The whole Euclideon thing uses such wildly different representations of geometry (pointclouds, or maybe signed distance fields, I forget), and really never worked well for anything other than the geometry itself.
Amazing work! Sadly couldn't understand all of it :(. I have a question tho, are we limited to 32 materials? (for the 32bit mask)
Amazing talk!
i understood 2% of this but that 2% was crazy
What's the TL;DW ?
you could have done smaller cuts in order to read more fluetly, because the micro stops and re-pronunciation after errors and change of intonation are very distracting.
thanks for your time
this is unreal
What study path should I follow to understand this? I'm a programmer with game development experience. I would love to be able to manipulate the rendering pipeline.
Make games from scratch.
This is sorcery at it best