@@Acerola_t It looks like you are living in a bunker hiding from the AAA cartel 🤣🤣🤣 Amazing video though, currently studying Software Engineering and will take a specialization in game dev, I am sure these video's will be helpful :D
Some things you can do to further optimize this,Instead of having a single blade per unit of grass have multiple, This means you have significantly fewer objects to render. Also over distance start randomly calling The shorter blades, they are very unlikely to be seen so you can just remove those. This removes the harsh edge as well because we are slowly fading in the grass instead of there being a harsh line all of a sudden. Also put most of your detail at the top of the blade and less at the bottom, we’re not very likely to see the bottom of the blade because of the blades in front of it. Greets someone who spent too much time on game foliage.
You actually don’t even need the whole mesh past a few meters. Instead of swapping it out for an entire lower mesh count model, you can just plug in the grass tops, on top of this past a certain distance you don’t even have to apply a skew, as you can only see the tips just revert back to a simple lateral translation.
Have you thought about replacing far-away grass clumps with the first method (quad+texture)? This would probably work way better with the GPU instance positions being relative to the camera. Great vids btw :)
You can use a division structure(think 1 chunk has 4 sub chunks that each have 4 sub chunks) then frustum cull the chunks instead of individual blades for way faster culling.
Yeah I wanted to hear them talk about the culling method more, they didn't really discuss it at all. I am currently puzzled how they updated its buffer every frame for millions of blades of grass.
How hard would it be to do an occlusion culling on the vertexes that are not visible because they are hidden behind other blades of grass? Also it would look better if the grass on the edge inclined from opposite to camera to normal instead of just becoming visible. Or dithering if you want something a bit less expensive.
Thank you chunks! I have no idea how hard it would slow things down but one subtle thing that makes grass look like grass is that it is not just darker at the bottom than the top, but also darker in the middle than at the edge and darker on one side than the other. You definitely don't see this shading nearly as often as the kind you did, but I have no idea whether it's too slow, or if it's just not something people think about.
This is similar-ish to what I did several years ago on a shipped game. I used a precomputed quadtree rather than chunks because the area of coverage was known and the frustum test was just iterating over the blades of the visible nodes and copying them to the final instance buffer (no scan and compact, just directly from one to the other). It populated an indirect draw structure at the same time so the whole thing was essentially two calls (compute dispatch + indirect draw). Another way of hiding the end of the grass is to reduce the height of the blades in the vertex shader as they approach the far distance. If your grass is a similar color to the terrain you can't see where it ends and then you also don't need fog. A decent memory savings was also only place one tile of grass blades and treat the chunks as an instance of that base tile and then expand things out in the visibility pass.
With all this new knowledge on video game grass (which is not something I expected life to bring to me but I'm not mad) I feel the urge to make an open world that's just a grassy field. A really really nice grassy field.
Great video. I see that other people have already mentioned using billboards at distance, but there are some other, intermediate optimizations that I think you could use as well. Just as a thought experiment, I know you're no longer working on this. - The GDC talk on the grass in Ghost of Tsuhsima widens the grass at further distances while drawing fewer blades. They make the blades twice as wide for half as many blades. While they generate their grass blades in the geometry shader, this should still be possible with the 3d models you are using in the vertex shader. - The same GDC talk mentions how they use a grass model that forms a V, which gives them more coverage with fewer blades. It will make the grass look different, though. I read through the shader code for the grass and there's a number of smaller optimizations that could be employed. I don't know if they would actually do any good, and the compiler may even do them for you, but they could be worth trying out just to see. - You use "v.uv.y * v.uv.y" three times in the vertex shader, twice in "_Scale * v.uv.y * v.uv.y". If you put them into a temporary variable the GPU would keep them in a register rather than computing them every time they're needed. - RotateAroundXInDegrees and RotateAroundYInDegrees are fairly expensive functions, but much of that stuff can either be pre-computed or computed once (per vertex). RotateAroundXInDegrees is only called once, with a constant value passed into "Degrees". You can store 'm' as a constant since it never, ever changes. RotateAroundYInDegrees is similar, the m for idHash * 180.0 only needs to be computed once per vertex. - You can inline the RotateAround_InDegrees functions to avoid the overhead from function calls. - floating point arithmetic is non-associative, which means optimizations that may be obvious won't be performed by the compiler because you won't get exactly the same answer. UNITY_PI / 180.0 should just be a constant, but in order to tell the compiler that you have to put it in parathesis, or just #define a constant. - If you put positionBuffer[instanceID].displacement into a temporary variable, you'll be telling the compiler to load the value into a register rather than reading from cache, or worse, vram, every time it needs it. - There are likely some code motion optimizations to be made, particularly in concert with the above, but I don't know enough about the hardware or compiler to really give many good recommendations for it. I will say that putting all of the positionBuffer[instanceID] calls next to each-other improves temporal locality and may have an effect on performance. As I mentioned, a lot of this stuff may not even help. It's all very much "try it and see". The compiler may already do most/all of this already and so the end result is just making your code impossible to maintain to get an extra 3 fps. There's also Amdahl's law. I may be focusing all of this attention on some parts of the procedure that take up 1/10 of the actual execution time.
A thought occurred to me when culling and chunks were mentioned. Would it be possible to further optimize by reducing the number of grass blades relative to distance each chunk is from the camera? While this may not reduce overall GPU load considering the extra calculations it could at the very least extend the range of which grass is visible, while simultaneously allowing for a gradient fall-off, which eliminates the need for a fog that could obstruct distant objects like mountains or buildings.
That'd probably result in it look noticeably sparse in the distance - which could then be fixed by making the blades thicker, similar to what mipmapping does to transparent textures. Alternatively, billboards in the distance, funny blades up close.
@@AlphaGarg i think zelda botw has this, althought its not that noticeable nor is it too cumbersome given the gradient-shader it has going on! I think this is a great idea!
Another option would be to replace the more detailed 3D grass with animated billboarded images. Or, you could replace the texture of faraway areas that have grass with a grassy texture. A mix of all of these would probably be desirable
@@myrealusername2193 They could use 1 tri polygons for the entire thing, the problem though is you need to enable alpha testing which is expensive. They could experiment with different configurations to see which is more optimal, such as high poly mesh with no alpha testing, vs. 1 tri mesh with alpha testing enabled. It might even depend on what gpu is used also.
awesome demo, in breath of the wild, the "high poly" grass is only two triangles, in an elongated diamond shape and it still looks good, I wonder how much of a performance boost that would give your version?
@@Acerola_t I’d imagine it’s much more worth it on a Switch due to its low power and only 4 gigs of shared RAM. Lower polygon counts would save a decent amount of RAM as well as adding performance.
@@myrealusername2193 The memory cost comes from the stored positions of each grass blade. The grass blade mesh itself is not an issue at all, especially when we're talking about a 1 or 2 tri difference.
@@Acerola_t hmm, would that be from only storing the mesh once and then drawing it at each position instead of storing an object with all of the geometry at every point? I don’t really have any experience with 3D graphics (all the games I’ve made are web games with the only animation being the occasional fade in/fade out or animated flags) so I don’t really know how exactly all the geometry and objects and things are stored in memory. I’d imagine it depends for each system, though there might be some standard way all shaders and engines generally use. I’m surprised that the positions take up so much memory though, are they stored as floats or something? Or like a 64-bit signed integer would likely take up a lot of space as well.
@@myrealusername2193 Buffering the offsets is 96 bytes (3 32 byte signed floats)* 7 million grass blades, so, ~700 Mb just for the offsets. The GPU does have options for lower precision floating point representations, but you're working with worldspace co-ordinates, so you would end up with grass looking extremely wonky after traveling a relatively short distance from origin due to each vertex getting subtly (or not so subtly) "shifted" by precision loss. You -could- try to manage that by introducing some kind of "origin shifting" logic, but I suspect you'd end up with some subtle bugs, and troubleshooting shaders is not a lot of fun because you can't really do stuff like step-based debugging on the GPU - there be dragons. Origin re-mapping is hard enough to manage on the CPU without bugs.
This is great because you're optimising the grass based on the worst case scenario. I think the point you made at 7:35 is important, your actual game world is probably never going to have a scene with that much grass in a single frame. Come to think of it, the only area of BotW with that level of grass density is Hyrule field, but its important to note that there is almost nothing else populating Hyrule field.
I doubt it matters to say this at this point, but I think another option for covering up the fact that you're culling the far-away grass is to simply color the ground the same as the grass. At that distance, I doubt you would even notice that the grass stopped existing. If the animation of the grass changes the color of it enough to give it away, you might also consider applying an animation to the ground itself to match the change. I say this because although fog is very common is games, it makes sense to strongly consider other options before using it given how it destroys any hope of viewing landmarks or terrain in the distance. Standing on top of a hill and looking out to the surrounding terrain and having it fill you with a sense of awe and wonder is very desirable effect, and cutting corners elsewhere to achieve this effect is often worth it. Good video and I'll look forward to any new content.
You could probably save a lot of that memory by having the grass procedurally generated, so the memory would only be used for a brief period of time, but then the procedural generation would have to be extremely fast in order to not hinder the rendering performance if calculated per-frame. Nice explanation of how it all comes together to make grass like this possible.
No doubt someone has already said this, but you might be able to use a mesh with more blades of grass and decrease the density. Sure, each mesh would have more polygons, but you'd be keeping a track of way less meshes, no doubt making it cheaper on the GPU's VRAM. You could also put less blades of grass into the lower LOD meshes to make a smoother transition between the grassed and grassless areas. Oh and you could use billboards at a distance too. Not like anyone would notice.
Those would be great further optimizations, quite frankly the only reason there's so many meshes in the example is to demonstrate how fast the technique is at extreme numbers lol
Thank you! Compute shaders are my favorite thing ever, they're so powerful. If you want some good resources for getting started with them I recommend: kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html and blog.three-eyed-games.com/2018/05/03/gpu-ray-tracing-in-unity-part-1/
Also one way to reduce the memory usage would be to remove the cached displacement value and instead sample the height map in the vertex shader, but it would add 9 texture samples per grass blade (3 for LOD) so I think taking the memory hit is more ideal. Additionally you could probably bit pack the uv coordinates, but I'm not confident on that.
@@Acerola_tCan't you attach extra data to a mesh instance that'd get passed to the vertex shader? You'd only need to query the heightmap texture once then, on instancing. (Not a GPU guy, I don't know if I'm making sense.) Also, bit packing the coordinates seems very doable. Store a reference centre coordinate in full 32-bit floats for each chunk in a quad-tree, then apply a -1.0 to 1.0 uv-coordinate space "modifier" for each leaf of grass - with only a byte-size integer, you could index a 0.5m x 0.5m chunk to every ~2mm on each axis. Looks to me like you could get down to 2 bytes minimum, 4 if you need a bit more resolution (half-floats, if available, would work great), assuming the z axis can be omitted; if it can't, it also don't need more than 1-2 bytes, since the chunk can also define a reference average or middle offset.
This is cool, you could probably optimize it further by combining your billboard grass with meshed grass and using the billboard ones at longer distances.
@@kered13 I don't think dithering would work well. Zelda Breath of the Wild actually this combination of mesh and billboard grass I was talking about, so apparently I'm not the first to think of this idea
Hoping you follow this up by adding ghost of tsushimas grass stuff, rounded normal for the grass blades to hide that it's 2D, varying height, shorter glass blades are converted into 2 blades to to make use of the vertices and add density
I'm very happy I found this channel. Others are only really entertaining and fun I guess, but the real genius of game development is far more interesting. It also helps that your presentation is witty and concise. As a casual gamer and professional procrastinator I find this all super cool. :)
damn bruh. Thats exactly the type of nerdy computer graphics stuff i dig. Im in no way close to coding my render engine, but i work in Houdini and instancing grass here is pain in the ass too.
Would be cool to see another intermediate LOD where it goes back to the textured quad approach for anything more than ~10m away. Would be near indistinguishable since it only looks bad when viewed from directly above, and unless you have an ortho camera that spot would be covered by the better grass. Might eliminate the need for fog.
This would be very useful for a mysterious, serene, foggy field in an experimental indie horror game which appears in two scenes for no apparent reason, and then places you back in the game, just to make you question whether what you're experiencing is *supposed* to be coherent or not.
Hi. I love your videos! I downloaded this grass project to play around with it a bit and understand everything in more detail. The grass culling is quite expensive. If I only cull every 5th frame, the FPS increases from ~270 to ~440, and if I don't cull at all, I can get to around ~480. At those frame rates I've found it's very difficult to see the lack of culling for 4 frames unless the camera moves/rotates extremely fast. If we can link the camera to the grass and only cull every second or third frame while the camera view is changing, we can actually get a considerable FPS boost.
Great series! I learned a lot. Could you save space by rendering multiple blades per position? For example, one position could have 3-4 off shoots. Also, not sure if this will make a difference, depending on hilly-ness your maps, the grass could skip rendering on the downslope side of the hill.
Didnt realize a video about John Lenon talking about the grass could be so fun and educational. But I honestly didn't think you'd Reincarnate till Yoko was gone.
I kept expecting at some point that you'd have a radius around the camera outside of which there weren't individual blades of grass, but just like a blob disguised as grass. Kinda like you did where you stopped drawing the grass altogether at a certain radius, but colored the same color as the grass rather than brown.
you can actually put a bit of sampling in the pixel shader, a bit like parallax occlusion mapping (Look it up, its a common method ppl employ) and u can actually develop a shadow from it, instead of the displacement map, and it will shadow it all into the distance! would definitely make it look cooler. Parallel occlusion mapping actually is the first raytracing type method ppl did on the gpu. Realtime.
Since the grass is now stored in chunks (I also saw someone suggesting using quadtrees for better, more fine-grained culling), you maybe don't need to store as much data on the leaves' positions. Let the chunk itself store its world coordinates; let the leaves' positions be defined relative to the chunk. That way, you might be able to store what's currently floats it as a 1-byte integer or something. I saw this technique used in a video about a voxel-based game engine, but I'm thinking that it could work in any scenario with a well-defined world grid of some sort being used.
You'd have to get much fancier to do that, since this stuff exists only on the GPU then physics aren't possible. For stuff like trail tracks you'd have to keep track of a global texture that overlays on the terrain and then you write to that texture as the player drives around, then the grass texture would sample from that texture to see if it has been smushed down or not.
You can also look which grass is covered more than 3/5 and exchange it with the middle class model or if it is a bit further away even use the lowest model. And also gras which can't be seen because other grass blocks the view can be taken off alltogether but for this you have to take the hight and position into account.
Hi! I've been subscribed for a couple of days, and I'm loving your tutorials. You give just the right amount of detail needed to make it both fun to watch, easy to understand and follow along, I think you'd be a great teacher :) I'd like to ask something though, if you don't mind. What technologies/programs/ides do you use in your videos? You inspired me, and I want to start developing a game, just for fun and to see what happens! :D
Thanks for the video. In case it helps anyone, I was able to triple the framerate for the geometry grass shader simply by pulling in keijiro's latest noise include and using the 2D simplex version instead of the 3D snoise which is overkill since the Y isn't required.
I wonder if some of the optimizations used to render hair could also be used to render grass. Or if people working on grass optimization also used those to render better hair, fur, etc., because they're similar problems of "render a lot of this thing protruding from this other thing" except with hair the "other thing" may be moving.
Considering the number of grass increases exponentially with distance from the camera, you really want to cut down on the number you render at a distance as well. If you had a narrow band of low poly grass, then filled the following chucks with 2d decals you could probably save even more vram. Render every third point of grass with one decal with three grass straws could give a final performance boost.
@@drdca8263 Yes, you're probably correct. I didn't think through this comment before posting it. The main point is still true though, the last row of grass much harder to render than the closest one. So any cheap trick you can make in the distance could be worth it.
@@drdca8263If you're thinking in terms of exponential functions in fancy maths (and, relatedly, running time), sure. But I think in other instances, "exponential" simply refers to whenever there's a parameter being raised to some power. And we're talking about grass straws here. The number of those would be going up with the area of the circle defined by the view distance from the camera; pi*r^2. That could be said to be polynomial, but colloquially also "exponential".
@@mnxs I’m conflicted. On the one hand, I have had objections to people insisting that because people who study a particular topic use a certain word a certain way, that other people are also obligated to use the word that way rather than a different way. But, at the same time, I strongly prefer that everyone did not refer to things that grow polynomially as “exponential”. This is probably somewhat hypocritical of me… I guess it is because I think math is so great and i want everyone to know it? But like, people who study other fields probably feel the same way about their favored field.
Perlin noise texture input for grash splat map and you have one helluva grass shader ^-^ Basically no real scene needs that much grass everywhere so yeah this is pretty great!
Chunks are a blessing for both end-user and in-editor performance, and thank you for remembering that this thing called LOD exists, I was rendering entire scenes from far away off a city that didn't downscale the buildings and car models, first it wasn't a problem because there wasn't too much models or effects on the scene, but after sometime I was getting 30 fps in game and 50-60 in editor. I will try to implement LOD. Many thanks.
So in conclusion, this is very doable, but you need to keep that final cost in mind and weigh whether or not it's applicable to the project you're working on. Dualy noted. I was considering using this for a VRC world where the only other performance heavy things are 1, people's avatars, and 2, a video player. I'll have to see how much of an impact it will have with other level geometry loaded in.
What have you done to me ? Im obsessed with grass now.. Im supposed to be a gamer not some kind of.. real life human thing who goes outside or whatever that is.. Anyways great video lol, Im wondering about how you could fix the camera issue you were talking about using the original method tho
YOur channel got boosted by youtube, and I'm lucky enough to see it here! Anyways, for grass that far, I'd probably just clump a few sprites on that plane altogether with some noise, maybe that owuld save some precious memory :)
Thank you chunks So if I were implementing this more practically in a game, we would just have to make some artistic compromises for performance and not have such consistently dense geometric grass We would probably instantiate patches of grass along the ground, And have those patches of grass give some arbitrary amount of blades, and then if a chunk was arbitrarily far away you’d switch to a billboard or simple-tri LOD grass
Want to know something cool? You dont even have to store anything in memory, other than the one draw call. Just let your vertex shader transform your mesh with a simple function. PS: Consider dispatch sizes (wavefronts if you will). This is your way to infinity.
I know you said you wanted to be done with the grass, but I'm curious: is it possible to have objects (players and NPCs) effect the grass movement in a performant way? (flattening or making it shake as it's moved through) I've seen other tutorials on this for snow and grass using a displacement texture but I'd be curious to know if it works with this grass technique
Yeah the same technique is used for snow and grass. The player's position is written to a texture that overlays across the field and the shader then samples from that texture to inform itself if it should be flattened or not.
As someone who has worked on serious games/milsims grass is a fuckng nightmare because you have an expectation, especially in multiplayer environments, that if you are laying prone in grass, then people far away from you shouldn't be seeing grass de-render and you're soon to be corpse laying on a flat plane totally exposed. Honestly still no good solutions to this day.
Thank you chunks! Could you reduce the memory cost if you store only the Ids for referencing the blades from the first buffer inside the second buffer?
Regarding the memory efficiency, I believe UV and perhaps even displacement are unnecessary. UV should just be the same as Position.xz (unless Position is in camera space, in which case you could at worst introduce a camera pos uniform) and for the vertical displacement, it should be a solution to take like VertexID % 7 and use that to index the height in a constant array. (This is assuming that you have 7 vertices on a grass blade, you could insert a different number)
I had idea... You talked about lod, but what if at distance it would be one big mesh? Like big mesh? You could actually do that, even in order: 1. best grass 2. lod'd grass 3. one big mesh with color of grass. It is very obvious from looking that at the distance all the grass looks like just one color... You could abuse that ig...
I always wondered why you can't just reuse the texture in memory. Old gaming consoles like the PS1 did that, they didn't need a copy of every texture. I guess we have lost ancient knowledge in coding.
Thank you chunks :)
my mans out here sharing the secrets of AAA grass like it's no big deal. great work man, you're clearly very knowledgeable.
Haha thanks! My intent is to show how real world game assets might work.
@@Acerola_tyou are the savior of indie shitters like myself
Idk man, AAA games run like shit with no regard to optimization so this is probably better than whatever they're doing.
@@friendofp.24No, this is exactly what they're doing
@@Acerola_t It looks like you are living in a bunker hiding from the AAA cartel 🤣🤣🤣
Amazing video though, currently studying Software Engineering and will take a specialization in game dev, I am sure these video's will be helpful :D
Some things you can do to further optimize this,Instead of having a single blade per unit of grass have multiple,
This means you have significantly fewer objects to render.
Also over distance start randomly calling The shorter blades, they are very unlikely to be seen so you can just remove those.
This removes the harsh edge as well because we are slowly fading in the grass instead of there being a harsh line all of a sudden.
Also put most of your detail at the top of the blade and less at the bottom, we’re not very likely to see the bottom of the blade because of the blades in front of it.
Greets someone who spent too much time on game foliage.
Too much time on foliage? No such thing.
@@AvarFeralfang except when its rendertime ;)
@FreekHoekstra fair enough. I do like nice foliage, though. 😀
I also thought about culling "randomly" at distance. Shorter grass first, but also with some noise to gradually cull less rather than stop abruptly.
@@RecOgMission I don't know if that's what they use in games but this is actually a great idea
You actually don’t even need the whole mesh past a few meters. Instead of swapping it out for an entire lower mesh count model, you can just plug in the grass tops, on top of this past a certain distance you don’t even have to apply a skew, as you can only see the tips just revert back to a simple lateral translation.
Have you thought about replacing far-away grass clumps with the first method (quad+texture)? This would probably work way better with the GPU instance positions being relative to the camera. Great vids btw :)
Or simply not rendering them and just having a tan-colored background that looks like the grass in the distance.
You can use a division structure(think 1 chunk has 4 sub chunks that each have 4 sub chunks) then frustum cull the chunks instead of individual blades for way faster culling.
that's a genius idea lmao
After your comment i wasted whole day researching different math topics. Thank you, it was interesting.
Yeah I wanted to hear them talk about the culling method more, they didn't really discuss it at all. I am currently puzzled how they updated its buffer every frame for millions of blades of grass.
How hard would it be to do an occlusion culling on the vertexes that are not visible because they are hidden behind other blades of grass?
Also it would look better if the grass on the edge inclined from opposite to camera to normal instead of just becoming visible. Or dithering if you want something a bit less expensive.
Isn't that what a quad-tree is? Or is that something different?
Thank you chunks! I have no idea how hard it would slow things down but one subtle thing that makes grass look like grass is that it is not just darker at the bottom than the top, but also darker in the middle than at the edge and darker on one side than the other. You definitely don't see this shading nearly as often as the kind you did, but I have no idea whether it's too slow, or if it's just not something people think about.
This is similar-ish to what I did several years ago on a shipped game. I used a precomputed quadtree rather than chunks because the area of coverage was known and the frustum test was just iterating over the blades of the visible nodes and copying them to the final instance buffer (no scan and compact, just directly from one to the other). It populated an indirect draw structure at the same time so the whole thing was essentially two calls (compute dispatch + indirect draw). Another way of hiding the end of the grass is to reduce the height of the blades in the vertex shader as they approach the far distance. If your grass is a similar color to the terrain you can't see where it ends and then you also don't need fog. A decent memory savings was also only place one tile of grass blades and treat the chunks as an instance of that base tile and then expand things out in the visibility pass.
oh damn, reducing the grass height with distance is so smart.
With all this new knowledge on video game grass (which is not something I expected life to bring to me but I'm not mad) I feel the urge to make an open world that's just a grassy field. A really really nice grassy field.
Great video.
I see that other people have already mentioned using billboards at distance, but there are some other, intermediate optimizations that I think you could use as well. Just as a thought experiment, I know you're no longer working on this.
- The GDC talk on the grass in Ghost of Tsuhsima widens the grass at further distances while drawing fewer blades. They make the blades twice as wide for half as many blades. While they generate their grass blades in the geometry shader, this should still be possible with the 3d models you are using in the vertex shader.
- The same GDC talk mentions how they use a grass model that forms a V, which gives them more coverage with fewer blades. It will make the grass look different, though.
I read through the shader code for the grass and there's a number of smaller optimizations that could be employed. I don't know if they would actually do any good, and the compiler may even do them for you, but they could be worth trying out just to see.
- You use "v.uv.y * v.uv.y" three times in the vertex shader, twice in "_Scale * v.uv.y * v.uv.y". If you put them into a temporary variable the GPU would keep them in a register rather than computing them every time they're needed.
- RotateAroundXInDegrees and RotateAroundYInDegrees are fairly expensive functions, but much of that stuff can either be pre-computed or computed once (per vertex). RotateAroundXInDegrees is only called once, with a constant value passed into "Degrees". You can store 'm' as a constant since it never, ever changes. RotateAroundYInDegrees is similar, the m for idHash * 180.0 only needs to be computed once per vertex.
- You can inline the RotateAround_InDegrees functions to avoid the overhead from function calls.
- floating point arithmetic is non-associative, which means optimizations that may be obvious won't be performed by the compiler because you won't get exactly the same answer. UNITY_PI / 180.0 should just be a constant, but in order to tell the compiler that you have to put it in parathesis, or just #define a constant.
- If you put positionBuffer[instanceID].displacement into a temporary variable, you'll be telling the compiler to load the value into a register rather than reading from cache, or worse, vram, every time it needs it.
- There are likely some code motion optimizations to be made, particularly in concert with the above, but I don't know enough about the hardware or compiler to really give many good recommendations for it. I will say that putting all of the positionBuffer[instanceID] calls next to each-other improves temporal locality and may have an effect on performance.
As I mentioned, a lot of this stuff may not even help. It's all very much "try it and see". The compiler may already do most/all of this already and so the end result is just making your code impossible to maintain to get an extra 3 fps. There's also Amdahl's law. I may be focusing all of this attention on some parts of the procedure that take up 1/10 of the actual execution time.
I was looking into how to make grass and I did *not* expect to get a full video series on it. Thanks for this super handy resource!
Be sure to read the other comments for more potential optimizations! I def have a lot more I could do
A thought occurred to me when culling and chunks were mentioned. Would it be possible to further optimize by reducing the number of grass blades relative to distance each chunk is from the camera? While this may not reduce overall GPU load considering the extra calculations it could at the very least extend the range of which grass is visible, while simultaneously allowing for a gradient fall-off, which eliminates the need for a fog that could obstruct distant objects like mountains or buildings.
That'd probably result in it look noticeably sparse in the distance - which could then be fixed by making the blades thicker, similar to what mipmapping does to transparent textures.
Alternatively, billboards in the distance, funny blades up close.
@@AlphaGarg i think zelda botw has this, althought its not that noticeable nor is it too cumbersome given the gradient-shader it has going on! I think this is a great idea!
@@AlphaGarg I was thinking about that just as I finished watching the video.
Another option would be to replace the more detailed 3D grass with animated billboarded images.
Or, you could replace the texture of faraway areas that have grass with a grassy texture.
A mix of all of these would probably be desirable
@@myrealusername2193 They could use 1 tri polygons for the entire thing, the problem though is you need to enable alpha testing which is expensive. They could experiment with different configurations to see which is more optimal, such as high poly mesh with no alpha testing, vs. 1 tri mesh with alpha testing enabled. It might even depend on what gpu is used also.
awesome demo, in breath of the wild, the "high poly" grass is only two triangles, in an elongated diamond shape and it still looks good, I wonder how much of a performance boost that would give your version?
Probably like 5-10 fps i think
@@Acerola_t I’d imagine it’s much more worth it on a Switch due to its low power and only 4 gigs of shared RAM. Lower polygon counts would save a decent amount of RAM as well as adding performance.
@@myrealusername2193 The memory cost comes from the stored positions of each grass blade. The grass blade mesh itself is not an issue at all, especially when we're talking about a 1 or 2 tri difference.
@@Acerola_t hmm, would that be from only storing the mesh once and then drawing it at each position instead of storing an object with all of the geometry at every point?
I don’t really have any experience with 3D graphics (all the games I’ve made are web games with the only animation being the occasional fade in/fade out or animated flags) so I don’t really know how exactly all the geometry and objects and things are stored in memory. I’d imagine it depends for each system, though there might be some standard way all shaders and engines generally use.
I’m surprised that the positions take up so much memory though, are they stored as floats or something? Or like a 64-bit signed integer would likely take up a lot of space as well.
@@myrealusername2193 Buffering the offsets is 96 bytes (3 32 byte signed floats)* 7 million grass blades, so, ~700 Mb just for the offsets. The GPU does have options for lower precision floating point representations, but you're working with worldspace co-ordinates, so you would end up with grass looking extremely wonky after traveling a relatively short distance from origin due to each vertex getting subtly (or not so subtly) "shifted" by precision loss.
You -could- try to manage that by introducing some kind of "origin shifting" logic, but I suspect you'd end up with some subtle bugs, and troubleshooting shaders is not a lot of fun because you can't really do stuff like step-based debugging on the GPU - there be dragons. Origin re-mapping is hard enough to manage on the CPU without bugs.
Was just about the make some sexy grass for my game. Thanks for the tips and have a great day!
This is great because you're optimising the grass based on the worst case scenario. I think the point you made at 7:35 is important, your actual game world is probably never going to have a scene with that much grass in a single frame. Come to think of it, the only area of BotW with that level of grass density is Hyrule field, but its important to note that there is almost nothing else populating Hyrule field.
I have never been so invested into a story in my life.
This is better then any show out there.
I doubt it matters to say this at this point, but I think another option for covering up the fact that you're culling the far-away grass is to simply color the ground the same as the grass. At that distance, I doubt you would even notice that the grass stopped existing. If the animation of the grass changes the color of it enough to give it away, you might also consider applying an animation to the ground itself to match the change. I say this because although fog is very common is games, it makes sense to strongly consider other options before using it given how it destroys any hope of viewing landmarks or terrain in the distance. Standing on top of a hill and looking out to the surrounding terrain and having it fill you with a sense of awe and wonder is very desirable effect, and cutting corners elsewhere to achieve this effect is often worth it. Good video and I'll look forward to any new content.
i genuinely love how much i learn from these videos and how entertaining and casual they are
You could probably save a lot of that memory by having the grass procedurally generated, so the memory would only be used for a brief period of time, but then the procedural generation would have to be extremely fast in order to not hinder the rendering performance if calculated per-frame. Nice explanation of how it all comes together to make grass like this possible.
No doubt someone has already said this, but you might be able to use a mesh with more blades of grass and decrease the density.
Sure, each mesh would have more polygons, but you'd be keeping a track of way less meshes, no doubt making it cheaper on the GPU's VRAM.
You could also put less blades of grass into the lower LOD meshes to make a smoother transition between the grassed and grassless areas.
Oh and you could use billboards at a distance too. Not like anyone would notice.
Those would be great further optimizations, quite frankly the only reason there's so many meshes in the example is to demonstrate how fast the technique is at extreme numbers lol
Really good video! This has made me really interested in learning compute shaders. I can't wait to see what topic you cover next.
Thank you!
Compute shaders are my favorite thing ever, they're so powerful. If you want some good resources for getting started with them I recommend:
kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html
and
blog.three-eyed-games.com/2018/05/03/gpu-ray-tracing-in-unity-part-1/
@@Acerola_t Awesome thank you man
Also one way to reduce the memory usage would be to remove the cached displacement value and instead sample the height map in the vertex shader, but it would add 9 texture samples per grass blade (3 for LOD) so I think taking the memory hit is more ideal.
Additionally you could probably bit pack the uv coordinates, but I'm not confident on that.
@@Acerola_tCan't you attach extra data to a mesh instance that'd get passed to the vertex shader? You'd only need to query the heightmap texture once then, on instancing. (Not a GPU guy, I don't know if I'm making sense.)
Also, bit packing the coordinates seems very doable. Store a reference centre coordinate in full 32-bit floats for each chunk in a quad-tree, then apply a -1.0 to 1.0 uv-coordinate space "modifier" for each leaf of grass - with only a byte-size integer, you could index a 0.5m x 0.5m chunk to every ~2mm on each axis. Looks to me like you could get down to 2 bytes minimum, 4 if you need a bit more resolution (half-floats, if available, would work great), assuming the z axis can be omitted; if it can't, it also don't need more than 1-2 bytes, since the chunk can also define a reference average or middle offset.
This was a very nice 45 minute journey through grass. I love that you still respond to comments on nearly 2 year old videos.
I found your channel recently and all I can say is, it's so knowledgeably chill. Thanks
This is cool, you could probably optimize it further by combining your billboard grass with meshed grass and using the billboard ones at longer distances.
This was my first thought as well. It might need some dithering to make the transition seamless, but I think it would work well.
@@kered13 I don't think dithering would work well. Zelda Breath of the Wild actually this combination of mesh and billboard grass I was talking about, so apparently I'm not the first to think of this idea
@@mariovelez578 I mean fitting between the full model and the billboard grass do that the transition between them isn't so sudden.
@@kered13 Yes, I know, but won't the dithering make it look grainy in that area?
your videos are so good, it blew the power to my house. but i'm back and digging on the smooth sounds.
Thanks! glad your power has returned
Thank you chunks!!!
Hoping you follow this up by adding ghost of tsushimas grass stuff, rounded normal for the grass blades to hide that it's 2D, varying height, shorter glass blades are converted into 2 blades to to make use of the vertices and add density
Thank you chunks (:
These videos are genuinely fascinating
You're like Dani, but actually enjoyable to watch
This might have been the best video I've seen on anything shaders related.
I'm very happy I found this channel. Others are only really entertaining and fun I guess, but the real genius of game development is far more interesting. It also helps that your presentation is witty and concise. As a casual gamer and professional procrastinator I find this all super cool. :)
thank you chunks! what a kind fellow
Worked! What an absolute genius mad lad! Was so easy
damn bruh. Thats exactly the type of nerdy computer graphics stuff i dig. Im in no way close to coding my render engine, but i work in Houdini and instancing grass here is pain in the ass too.
Would be cool to see another intermediate LOD where it goes back to the textured quad approach for anything more than ~10m away. Would be near indistinguishable since it only looks bad when viewed from directly above, and unless you have an ortho camera that spot would be covered by the better grass. Might eliminate the need for fog.
Nice video, it looks amazing 💙
This would be very useful for a mysterious, serene, foggy field in an experimental indie horror game which appears in two scenes for no apparent reason, and then places you back in the game, just to make you question whether what you're experiencing is *supposed* to be coherent or not.
Hi. I love your videos! I downloaded this grass project to play around with it a bit and understand everything in more detail. The grass culling is quite expensive. If I only cull every 5th frame, the FPS increases from ~270 to ~440, and if I don't cull at all, I can get to around ~480. At those frame rates I've found it's very difficult to see the lack of culling for 4 frames unless the camera moves/rotates extremely fast. If we can link the camera to the grass and only cull every second or third frame while the camera view is changing, we can actually get a considerable FPS boost.
Great series! I learned a lot. Could you save space by rendering multiple blades per position? For example, one position could have 3-4 off shoots.
Also, not sure if this will make a difference, depending on hilly-ness your maps, the grass could skip rendering on the downslope side of the hill.
Didnt realize a video about John Lenon talking about the grass could be so fun and educational. But I honestly didn't think you'd Reincarnate till Yoko was gone.
Thank you chuncks!!
I kept expecting at some point that you'd have a radius around the camera outside of which there weren't individual blades of grass, but just like a blob disguised as grass. Kinda like you did where you stopped drawing the grass altogether at a certain radius, but colored the same color as the grass rather than brown.
you can actually put a bit of sampling in the pixel shader, a bit like parallax occlusion mapping (Look it up, its a common method ppl employ) and u can actually develop a shadow from it, instead of the displacement map, and it will shadow it all into the distance! would definitely make it look cooler. Parallel occlusion mapping actually is the first raytracing type method ppl did on the gpu. Realtime.
Since the grass is now stored in chunks (I also saw someone suggesting using quadtrees for better, more fine-grained culling), you maybe don't need to store as much data on the leaves' positions. Let the chunk itself store its world coordinates; let the leaves' positions be defined relative to the chunk. That way, you might be able to store what's currently floats it as a 1-byte integer or something. I saw this technique used in a video about a voxel-based game engine, but I'm thinking that it could work in any scenario with a well-defined world grid of some sort being used.
Optimization is my favorite topic you cover
Is there a way to make this grass interact with physics objects? Like leaving a tire trail with a car.
You'd have to get much fancier to do that, since this stuff exists only on the GPU then physics aren't possible. For stuff like trail tracks you'd have to keep track of a global texture that overlays on the terrain and then you write to that texture as the player drives around, then the grass texture would sample from that texture to see if it has been smushed down or not.
Thank you chunks!
I think I found the perfect grass
You can also look which grass is covered more than 3/5 and exchange it with the middle class model or if it is a bit further away even use the lowest model. And also gras which can't be seen because other grass blocks the view can be taken off alltogether but for this you have to take the hight and position into account.
Hi! I've been subscribed for a couple of days, and I'm loving your tutorials.
You give just the right amount of detail needed to make it both fun to watch, easy to understand and follow along, I think you'd be a great teacher :)
I'd like to ask something though, if you don't mind. What technologies/programs/ides do you use in your videos? You inspired me, and I want to start developing a game, just for fun and to see what happens! :D
Unity for the game engine. Autodesk Maya for the 3D modelling, but Blender is quite popular (and free).
very cool grass
Thanks for the video. In case it helps anyone, I was able to triple the framerate for the geometry grass shader simply by pulling in keijiro's latest noise include and using the 2D simplex version instead of the 3D snoise which is overkill since the Y isn't required.
Yeah I didnt bother optimizing the geometry shader grass because I'd rather do literally anything else
I wonder if some of the optimizations used to render hair could also be used to render grass. Or if people working on grass optimization also used those to render better hair, fur, etc., because they're similar problems of "render a lot of this thing protruding from this other thing" except with hair the "other thing" may be moving.
Awesome videos!
i could tell this guy really liked grass off the glasses alone
Thank you, chunks! :)
amazing video bro keep it up
Thanks!
Considering the number of grass increases exponentially with distance from the camera, you really want to cut down on the number you render at a distance as well.
If you had a narrow band of low poly grass, then filled the following chucks with 2d decals you could probably save even more vram.
Render every third point of grass with one decal with three grass straws could give a final performance boost.
Not exponentially. The growth is only polynomial, and less than cubic.
@@drdca8263 Yes, you're probably correct. I didn't think through this comment before posting it.
The main point is still true though, the last row of grass much harder to render than the closest one. So any cheap trick you can make in the distance could be worth it.
@@drdca8263If you're thinking in terms of exponential functions in fancy maths (and, relatedly, running time), sure. But I think in other instances, "exponential" simply refers to whenever there's a parameter being raised to some power. And we're talking about grass straws here. The number of those would be going up with the area of the circle defined by the view distance from the camera; pi*r^2. That could be said to be polynomial, but colloquially also "exponential".
@@mnxs I’m conflicted. On the one hand, I have had objections to people insisting that because people who study a particular topic use a certain word a certain way, that other people are also obligated to use the word that way rather than a different way.
But, at the same time, I strongly prefer that everyone did not refer to things that grow polynomially as “exponential”.
This is probably somewhat hypocritical of me…
I guess it is because I think math is so great and i want everyone to know it?
But like, people who study other fields probably feel the same way about their favored field.
Perlin noise texture input for grash splat map and you have one helluva grass shader ^-^
Basically no real scene needs that much grass everywhere so yeah this is pretty great!
hehehe "grash splat" I'm a dum
I NEED MORE GRASS VIDEOS
wow thx man
man I wish more games had better grass!
Came for the grass, stayed for the Persona 3 music
underrated videos
bring back the grass saga !!!
Thank u chunks
Chunks are a blessing for both end-user and in-editor performance, and thank you for remembering that this thing called LOD exists, I was rendering entire scenes from far away off a city that didn't downscale the buildings and car models, first it wasn't a problem because there wasn't too much models or effects on the scene, but after sometime I was getting 30 fps in game and 50-60 in editor. I will try to implement LOD. Many thanks.
thanks, chunks
Acerola: "Everyone has 16 GB of ram now a days"
Me: "4 GB is best I can offer"
thank you chunks :)
So in conclusion, this is very doable, but you need to keep that final cost in mind and weigh whether or not it's applicable to the project you're working on. Dualy noted. I was considering using this for a VRC world where the only other performance heavy things are 1, people's avatars, and 2, a video player. I'll have to see how much of an impact it will have with other level geometry loaded in.
" put your desirable scale "
* Turns a single blade of grass into the mount everest *
I was entertained, and you will never know for sure (because you cant see me) but I was razzle dazzled.
THANK YOU CHUNKS!!!!!🎉🎉🎉
What have you done to me ? Im obsessed with grass now.. Im supposed to be a gamer not some kind of.. real life human thing who goes outside or whatever that is.. Anyways great video lol, Im wondering about how you could fix the camera issue you were talking about using the original method tho
"I went into Maya"
> Maya's theme starts playing
Nice
Great content.
Question for your brave soul: have you yet traversed the depths of hell known as optimizing for standalone VR devices?
I have not and probably wont for a long time lol
Okay now we have to unite and create ultimate grass experience that could be ran on any potato =)
YOur channel got boosted by youtube, and I'm lucky enough to see it here! Anyways, for grass that far, I'd probably just clump a few sprites on that plane altogether with some noise,
maybe that owuld save some precious memory :)
you earned a sub
the area of the square at 1:50 is 300 squared metres squared, not 300 metres squared.
Thank you chunks
So if I were implementing this more practically in a game, we would just have to make some artistic compromises for performance and not have such consistently dense geometric grass
We would probably instantiate patches of grass along the ground, And have those patches of grass give some arbitrary amount of blades, and then if a chunk was arbitrarily far away you’d switch to a billboard or simple-tri LOD grass
You are a fucking genius, and me still trying to implement a grass shader in godot
Want to know something cool? You dont even have to store anything in memory, other than the one draw call. Just let your vertex shader transform your mesh with a simple function. PS: Consider dispatch sizes (wavefronts if you will). This is your way to infinity.
May I call you "Acerola-Orion Heart-Under-Grass-Blade", please?
Sure lmao
Thank you for the video! Sorry that I sound like a bot...
I know you said you wanted to be done with the grass, but I'm curious: is it possible to have objects (players and NPCs) effect the grass movement in a performant way? (flattening or making it shake as it's moved through)
I've seen other tutorials on this for snow and grass using a displacement texture but I'd be curious to know if it works with this grass technique
Yeah the same technique is used for snow and grass. The player's position is written to a texture that overlays across the field and the shader then samples from that texture to inform itself if it should be flattened or not.
チャンクスさん、ありがとう。
As someone who has worked on serious games/milsims grass is a fuckng nightmare because you have an expectation, especially in multiplayer environments, that if you are laying prone in grass, then people far away from you shouldn't be seeing grass de-render and you're soon to be corpse laying on a flat plane totally exposed.
Honestly still no good solutions to this day.
Id recommend swapping 3D grass for 2D billboards when the grass is a certain distance from the camera aswel
He's back.
Nice sunglasses.
Thank you chunks!
Could you reduce the memory cost if you store only the Ids for referencing the blades from the first buffer inside the second buffer?
Regarding the memory efficiency, I believe UV and perhaps even displacement are unnecessary. UV should just be the same as Position.xz (unless Position is in camera space, in which case you could at worst introduce a camera pos uniform) and for the vertical displacement, it should be a solution to take like VertexID % 7 and use that to index the height in a constant array. (This is assuming that you have 7 vertices on a grass blade, you could insert a different number)
Thanos scene sent me🤣😂😂😂😂
God I’m addicted to grass
grass is a weed.
*takes long suck on roll up cigarette*
nice video
Thank you!
we could also just draw an average colour gathered per chunk or bake a colour map to it.
I had idea... You talked about lod, but what if at distance it would be one big mesh? Like big mesh? You could actually do that, even in order: 1. best grass 2. lod'd grass 3. one big mesh with color of grass.
It is very obvious from looking that at the distance all the grass looks like just one color... You could abuse that ig...
im morbidly curious how much its possible to optimize this. please do a follow up video just for the hell of it. X'D
honestly I have had several revelations since this vid came out as to how to make it even faster, so yeah I probably will do a follow up on it.
I always wondered why you can't just reuse the texture in memory. Old gaming consoles like the PS1 did that, they didn't need a copy of every texture. I guess we have lost ancient knowledge in coding.