Hey, just wanted to say thank you for making these videos. You are one of the most knowledgeable Tech Artists that creates educational content online, and it can be difficult to find a one stop shop channel like yours that specifically deals with Tech Art, and not just touches on it tangentially while dealing with other topics. At the company I work for fortunately/unfortunately I'm the most senior and I feel like no one there can actively teach me new things and I'm falling behind compared to the fast paced industry requirements if I don't actively look for content like yours, so it's been a joy to binge watch your videos and learn from them. Keep up the great work, and thanks again for creating such consistently high quality tutorials!
Thanks for that comment! 😊 Actually I was in a similar situation - a few years of learning tech art topics blindly. When I finally felt I'm getting a grasp on it, I decided to start the channel. "Be the person you needed when you were younger" ;) I don't know if you're on TAA Discord, but if not, feel free to join: discord.gg/CAAZrXGz
Thank you for your breakdown. Didn't know about the Shader Playground. I have a small suggestions for your upcoming breakdowns what would be nice to cover :-): - Texture arrays within UE4. - IDTech's megatextures / Virtual Textures - Dynamic Shader branching in UE4 and possible workaround ( Currently as far as I know, there is no way in UE4 how to force shader flow to work correctly. I mean if you want POM to be present on your 2x2km landscape, it is applying everywhere regardless how far is your camera. You want the POM shader effect to be applied only for certain distances (pixel distance) for specific pixels. I think REDEngine3, Dunia, CryEngine have already solved it but it is still problem in UE4) Anyway, thank you for your videos you making. Your UE4 render pipeline breakdown on your website is awesome! Keep going!
Thank you so much for all the valuable information! I went ahead and added a multiplication by zero to all my shaders and now my game runs super smoothly. I noticed a slight change on visuals but who cares about visuals if you cant pump enough frames!
Good luck with your game! I've done quite a bit of personal projects after hours, so I remember the pain of not having a graphics programmer to optimize stuff ;p
Hello, first of all, thank you for your amazing tutorials! I have a question relative to texture arrays. Do they cost the same amount of cycles as separate textures or not? To be more specific: I had the task to reproduce unique landscape material, already existed in a different GameEngine. That landscape used something about 64 textures ( 16 layers ). As I know UE doesn't support so much texture sampling (even if they shared wrap). So when I tried to combine all of them- I had an error. Every texture was well-optimised - I mean packing by RGB etc). And I came with a solution to use Texture Arrays - it is still using lots of samples, but already works. Combining this with RVT makes the shader not so heavy but still expensive. I'm not sure if that optimisation or not, I assume not and it just sort of a hack to avoid restrictions on the number of layers in UE. Could you explain how is It works under the hood? Thank you!
Interesting topic! Texture arrays, for what I understand, are just a nice collection of standard textures. Each layer can be of any size, has mipmaps etc., unrelated to each other. But virtual texturing, on the other hand, is an actual performance improvement. VT brings a great advantage of streaming and loading only those parts that are needed. I'd like to dive deeper into the topic of VTs, as I don't have experience with them
Thanks! It's not a matter to fix though. On PC it shows function calls rather than final assembly instructions. That's because they vary by hardware. And while AMD is transparent, Nvidia keeps their modern implementation secret. (Which is such a sad change since the days of the open Cg language documentation)
Right, my bad. And because UE requires a curve atlas to be square, the texture can't be small enough to hope for cache-friendliness. Thanks for pointing it out!
Excellent video. Very eye opening about potential problematic techniques that I employ; one in particular being overuse of texture samplers! I have a specific question about that, actually. I'm employing a very basic cel-shade technique in my game in which I sample a "ramp" texture from a dot product to get a nice hard shading line. Could I use a gradient/curve atlas (the feature you showed off with greyscale textures at the end) to avoid using samplers in my material? That would technically make them more performant since it's using math instead of cache/texture samplers, right? I'm just trying to understand some of the content of this video better and apply it to my project! Edit; Just noticed that the curve data bakes down to a texture by using the "Atlas" feature. That kinda negates the question I asked, I guess. It would be cool to look into ways of baking those ramp textures/curves into functions or something in a way to avoid using samplers, I guess. Maybe I'm completely out of my depth thinking that though. Tech art is extremely new to me!
Interesting question! I had used "math" gradients in one game. I made them out a small array of Vector3-s & a single lerp (between the 2 closest ones). That being said, I think that UE just renders the gradient atlas down to a texture, to simplify the code. And materials don't provide arrays in UE. So the best bet would be to just keep the texture resolution small. If you're lucky, the part you sample will fit within cache, instead of going all the way to the main RAM. And of course use the shared sampler
@@TechArtAid Oh okay! I'm already using a single shared sampler with multiple gradients packed into it, but I'll definitely look into keeping the ramp textures (or the atlas if I go that route) as small I as I can without visual artifacting. Thanks for the reply!
Fantastic and educational video, really loved it! I'm really curious though, if you have two different shaders on two different objects but both of them share the same texture sample location on disc, will that gain efficiency? An extreme example would be all my shaders sample 12 textures, but they are all the same samples across all shaders, but each shader is different in its math instructions.
Thanks! In this case no, there would be no gain in terms of "common instructions", cache reuse and that kind of stuff. There's a way to visualize and check it. If you grab a frame with RenderDoc or Intel Frame Analyzer, you can see a 'timeline' of draw events. You can see how it draws objects by material (and within it: by each mesh using that material). Then it switches to material 2, 3 etc. So if you have a different material (even differing by 1 mere instruction), it's a different draw call, therefore it's *after waiting* for the previous thing to finish. Therefore no room for a speed up. I hope it makes sense :) That said, there's a gain to have in your intuitive example. It's just somewhere else: in memory savings and streaming. If your environments utilize the same textures many times, there's way less resources to load in and out (stream).
@@TechArtAid thanks for the clarity. That helps a lot! Not to bombard you with questions, but in light of what you said if all your materials derived from the same uber shader would variable changes per instance be seen as completely different instructions, thus still causing the same textures to get called over and over again? I think the terms in Unreal would be your Uber shader is your Material and your variable changes would be on the Material Instances? I'm guessing if my texture is part of the same drawcall, then it gains efficiency, but if not, it gets called over and over again?
Yup. Variables (except static switches) keep the shader code unchanged, which is great. But... a separate material instance is a new draw call, because it needs to send a different constant buffer (aka variables' contents). It already helps though! That's because UE sorts draw calls by the shader, so at least it doesn't have to bind the shader again (just the parameters). But I have limited knowledge over this part. That said - there's an alternative, often superior, way to set parameters on meshes, independent from a material instance. And then you have 1 huge glorious draw call for all meshes reusing the material :) It's called Custom Primitive Data. See my video about it: ua-cam.com/video/I8lr9pdoSCY/v-deo.html Now the limitation is that CPD provides only floats and vectors. You can't pass textures that way. But we can abuse the system,, right? :D so just combine it with UDIMs and you got yourself a texture selector with 1 draw call: ua-cam.com/video/-oFY5QWXKZY/v-deo.html
With one sampler, but not samples :) The reads from a texture still need to happen. But sure, thanks for mentioning it. That's a good practice that I should have mentioned, as it allows for more textures to be loaded from a single material
Great video! Can I ask why you’ve said “all modern compilers (are good at optimization) - xbox, ps, radeon” and not mention nvidia there? Is there any difference?
As for now (2021) I have only worked with Microsoft's DXC / FXC for PC (which works for NVidia too). Not sure if NV has their own, except for the one built into the driver, or CUDA. But if yes, then they should be equally good
Then which is better? Texture with different noises on each channel, or generating 4 different noises with offsets inside the editor and merging to v4?
Do you mean the "Noise" node? It's very expensive. Much more than a texture. Also if you meaning merging grayscales to a v4 in material, then there's no need to do it. Shader compilers optimize for scalar math
I'm not a technical artist, but this is extremely instructive, thank you. One question: on my projects I often create a pretty complex Master Material, with many switches and options, and then use instances. It's very useful of course, but does it affect performance?
Nice to hear it! As for real time, it's usually fine. It mostly affects your performance :) What I mean is that you wait a long time for shaders to compile. That's because each new switch adds new combinations with all other switches. It's n² iirc, so 4 switches = 16 variants to compile (and in practice waaay more, because UE has internal feature toggles too).
@@TechArtAid Thanks! So it I understand well, a level with instanced materials would take longer to load, but would be fine playing afterwards? Or when you talk about performance are you only refering to rendered pixels?
No, not at all. I mean the moment when you hit Apply in the material editor. In a bigger project it may take ages to Compile Shaders (as you may know from UE memes ;d). Having fewer switches (and materials in the project) reduces that wait time For run time, switches don't have a cost. But search for 'draw calls' discussions. So its a balancing act
If you sample the same texture multiple times at different UVs per pixel, do you effectively save the performance from loading that texture into memory? or is it just as bad as 2 different samplers because it doesnt cache the way im thinking it does?
You save by having 1 less file to stream in from disk (file management, copying to VRAM). But I don't have an idea how significant that is (what range of latency)
Great video, as always. If I have a static switch parameter. How does that compiles? It makes 2 versions of the shader, one for true and one for false and swaps them?
Yes. Actually many more than 2 (a sum of all used variants). It's "invisible" when it comes to instructions, but see the comment by Cedric about other problems with them
Thank you very much. 2 questions. When you say less noisy or coherent UV, what exactly did you mean? Did you mean, for example, I have 1 weapon model and I have 1 texture but with 2 version of UV (So 2 textures but same result for that weapon )) ). One is autoUV and auto pack mess, other is less UV cut and packed by hand. So is it what you mean incoherent and coherent textures? Does it makes it unoptimal? Second question is, I can't right HLSL code, is there any way to convert my shader blueprint and put it in that site to check my real cost?
1. I mostly meant adding some noise to UV coordinates in the shader. For example if you want to make an irregular magic portal, you'd probably add a noisy flow map to the UVs, right before sampling a texture. Which is usually fine, but you have to remember it reduces efficiency. Tiny auto packed islands and a lot of seams may actually also affect that, true! 2. I forgot to show that. At the top menu bar of material editor, there is a "Show HLSL code" entry. Unfortunately though, the code UE produces is soooo messy :< All functions are there, then it relies on the compiler to optimize the unused ones away
Hi Tech Art Aid! this might not be the place for it. Tho i have a hlsl shader, I want to import into unreal engine. I need help badly, as i have been trying for years with no progress. I hope you are willing to take a look into this HLSL shader code, and to tell me if this tech if possible to apply into Unreal Engine or in an similar way. please I do hope you can help me. greets!
Oh, idk, haven't worked with mobile yet. Search for something like 'unreal android gpu profiling'. Cycles may be not available, but milliseconds could suffice. See my GPU Profiling series for intro to that domain
@@TechArtAid haha yea but its like epic is making things difficult for us on purpose 😢 i wanted to start doing some game stuff until i realized ue5 has horrible performance 😆
This video is amazingly informative, thanks for sharing. I have 2 questions: 1) If compiler doesn't wait for sampling to finish to do other stuff, does it mean that it start multiple samplers in parallel and the final cycles cost for the shader will be the longest sampling it tooks. For example it started 3 samplings - 100 cycles, 150 cycles and 125 cycles. The longest sampling was 150 cycles then entire sampling of 3 textures is 150 cycles. Do I miss something? Example from shader playground i.imgur.com/aTAV5M2.png 2) Where I can learn more about how much cycles texture sampling can actually cost?
Such high quality content. We are blessed to have you uploading all this amazing content for free, thank you so much
Every video is pure gold, thanks for the amazing work you're doing!!!!!
I'm glad you enjoy them! Tech art shouldn't be a well kept secret :D
Hey, just wanted to say thank you for making these videos. You are one of the most knowledgeable Tech Artists that creates educational content online, and it can be difficult to find a one stop shop channel like yours that specifically deals with Tech Art, and not just touches on it tangentially while dealing with other topics.
At the company I work for fortunately/unfortunately I'm the most senior and I feel like no one there can actively teach me new things and I'm falling behind compared to the fast paced industry requirements if I don't actively look for content like yours, so it's been a joy to binge watch your videos and learn from them. Keep up the great work, and thanks again for creating such consistently high quality tutorials!
Thanks for that comment! 😊 Actually I was in a similar situation - a few years of learning tech art topics blindly. When I finally felt I'm getting a grasp on it, I decided to start the channel. "Be the person you needed when you were younger" ;) I don't know if you're on TAA Discord, but if not, feel free to join: discord.gg/CAAZrXGz
Thank you for your breakdown. Didn't know about the Shader Playground. I have a small suggestions for your upcoming breakdowns what would be nice to cover :-):
- Texture arrays within UE4.
- IDTech's megatextures / Virtual Textures
- Dynamic Shader branching in UE4 and possible workaround ( Currently as far as I know, there is no way in UE4 how to force shader flow to work correctly. I mean if you want POM to be present on your 2x2km landscape, it is applying everywhere regardless how far is your camera. You want the POM shader effect to be applied only for certain distances (pixel distance) for specific pixels. I think REDEngine3, Dunia, CryEngine have already solved it but it is still problem in UE4)
Anyway, thank you for your videos you making. Your UE4 render pipeline breakdown on your website is awesome! Keep going!
Thank you so much for all the valuable information! I went ahead and added a multiplication by zero to all my shaders and now my game runs super smoothly. I noticed a slight change on visuals but who cares about visuals if you cant pump enough frames!
😂 True tech artist
I love the dryness of it all. It thought for a while, and realized that we are stupid, as humans. Learning so much basics though, thanks for that!
Very happy to discover your channel!
Truly splendid! I'd like to hop into the live session next time.
See ya there! twitch.tv/techartaid, 8 PM BST on Thursdays
As an upcoming indie game dev I absolutely loved this 😍
Good luck with your game! I've done quite a bit of personal projects after hours, so I remember the pain of not having a graphics programmer to optimize stuff ;p
@@TechArtAid many of us indie devs are no experts or mathematicians, but content like yours puts us in the right state of mind :)
This is gold, thank you very much 👍
Awesome content! Keep doing these and I will watch!
Absolute stellar info! thx for sharing!
Awesome! Thank you!
Hello, first of all, thank you for your amazing tutorials! I have a question relative to texture arrays. Do they cost the same amount of cycles as separate textures or not?
To be more specific: I had the task to reproduce unique landscape material, already existed in a different GameEngine. That landscape used something about 64 textures ( 16 layers ). As I know UE doesn't support so much texture sampling (even if they shared wrap). So when I tried to combine all of them- I had an error. Every texture was well-optimised - I mean packing by RGB etc). And I came with a solution to use Texture Arrays - it is still using lots of samples, but already works. Combining this with RVT makes the shader not so heavy but still expensive.
I'm not sure if that optimisation or not, I assume not and it just sort of a hack to avoid restrictions on the number of layers in UE. Could you explain how is It works under the hood?
Thank you!
Interesting topic! Texture arrays, for what I understand, are just a nice collection of standard textures. Each layer can be of any size, has mipmaps etc., unrelated to each other.
But virtual texturing, on the other hand, is an actual performance improvement. VT brings a great advantage of streaming and loading only those parts that are needed. I'd like to dive deeper into the topic of VTs, as I don't have experience with them
This was extremely helpful, Thank you !
I have one question : Do you think that the instruction meter got fixed by now in the 5.4 ?
Thanks! It's not a matter to fix though. On PC it shows function calls rather than final assembly instructions. That's because they vary by hardware. And while AMD is transparent, Nvidia keeps their modern implementation secret. (Which is such a sad change since the days of the open Cg language documentation)
Awesome video, thank you very much!
Right, my bad. And because UE requires a curve atlas to be square, the texture can't be small enough to hope for cache-friendliness. Thanks for pointing it out!
Excellent video. Very eye opening about potential problematic techniques that I employ; one in particular being overuse of texture samplers! I have a specific question about that, actually. I'm employing a very basic cel-shade technique in my game in which I sample a "ramp" texture from a dot product to get a nice hard shading line. Could I use a gradient/curve atlas (the feature you showed off with greyscale textures at the end) to avoid using samplers in my material? That would technically make them more performant since it's using math instead of cache/texture samplers, right? I'm just trying to understand some of the content of this video better and apply it to my project!
Edit; Just noticed that the curve data bakes down to a texture by using the "Atlas" feature. That kinda negates the question I asked, I guess. It would be cool to look into ways of baking those ramp textures/curves into functions or something in a way to avoid using samplers, I guess. Maybe I'm completely out of my depth thinking that though. Tech art is extremely new to me!
Interesting question! I had used "math" gradients in one game. I made them out a small array of Vector3-s & a single lerp (between the 2 closest ones). That being said, I think that UE just renders the gradient atlas down to a texture, to simplify the code. And materials don't provide arrays in UE. So the best bet would be to just keep the texture resolution small. If you're lucky, the part you sample will fit within cache, instead of going all the way to the main RAM. And of course use the shared sampler
@@TechArtAid Oh okay! I'm already using a single shared sampler with multiple gradients packed into it, but I'll definitely look into keeping the ramp textures (or the atlas if I go that route) as small I as I can without visual artifacting. Thanks for the reply!
Fantastic and educational video, really loved it! I'm really curious though, if you have two different shaders on two different objects but both of them share the same texture sample location on disc, will that gain efficiency? An extreme example would be all my shaders sample 12 textures, but they are all the same samples across all shaders, but each shader is different in its math instructions.
Thanks! In this case no, there would be no gain in terms of "common instructions", cache reuse and that kind of stuff. There's a way to visualize and check it. If you grab a frame with RenderDoc or Intel Frame Analyzer, you can see a 'timeline' of draw events. You can see how it draws objects by material (and within it: by each mesh using that material). Then it switches to material 2, 3 etc. So if you have a different material (even differing by 1 mere instruction), it's a different draw call, therefore it's *after waiting* for the previous thing to finish. Therefore no room for a speed up. I hope it makes sense :)
That said, there's a gain to have in your intuitive example. It's just somewhere else: in memory savings and streaming. If your environments utilize the same textures many times, there's way less resources to load in and out (stream).
@@TechArtAid thanks for the clarity. That helps a lot! Not to bombard you with questions, but in light of what you said if all your materials derived from the same uber shader would variable changes per instance be seen as completely different instructions, thus still causing the same textures to get called over and over again? I think the terms in Unreal would be your Uber shader is your Material and your variable changes would be on the Material Instances?
I'm guessing if my texture is part of the same drawcall, then it gains efficiency, but if not, it gets called over and over again?
Yup. Variables (except static switches) keep the shader code unchanged, which is great. But... a separate material instance is a new draw call, because it needs to send a different constant buffer (aka variables' contents). It already helps though! That's because UE sorts draw calls by the shader, so at least it doesn't have to bind the shader again (just the parameters). But I have limited knowledge over this part.
That said - there's an alternative, often superior, way to set parameters on meshes, independent from a material instance. And then you have 1 huge glorious draw call for all meshes reusing the material :) It's called Custom Primitive Data. See my video about it: ua-cam.com/video/I8lr9pdoSCY/v-deo.html
Now the limitation is that CPD provides only floats and vectors. You can't pass textures that way. But we can abuse the system,, right? :D so just combine it with UDIMs and you got yourself a texture selector with 1 draw call:
ua-cam.com/video/-oFY5QWXKZY/v-deo.html
@@TechArtAid thanks so much for the explanation! Really appreciate it! Got lots to look into and learn.
You forgot to mention that we can load multiple textures with just one sample thanks to shared wrap sampler source.
With one sampler, but not samples :) The reads from a texture still need to happen. But sure, thanks for mentioning it. That's a good practice that I should have mentioned, as it allows for more textures to be loaded from a single material
Great video! Can I ask why you’ve said
“all modern compilers (are good at optimization) - xbox, ps, radeon” and not mention nvidia there? Is there any difference?
As for now (2021) I have only worked with Microsoft's DXC / FXC for PC (which works for NVidia too). Not sure if NV has their own, except for the one built into the driver, or CUDA. But if yes, then they should be equally good
how did you learn so much? you are amazing
Then which is better? Texture with different noises on each channel, or generating 4 different noises with offsets inside the editor and merging to v4?
Do you mean the "Noise" node? It's very expensive. Much more than a texture. Also if you meaning merging grayscales to a v4 in material, then there's no need to do it. Shader compilers optimize for scalar math
But when you are useing gradients it is also sampling. But way smaller texture anyway, so it must fit in cache, I guess.
Exactly :)
I'm not a technical artist, but this is extremely instructive, thank you.
One question: on my projects I often create a pretty complex Master Material, with many switches and options, and then use instances. It's very useful of course, but does it affect performance?
Nice to hear it! As for real time, it's usually fine. It mostly affects your performance :) What I mean is that you wait a long time for shaders to compile. That's because each new switch adds new combinations with all other switches. It's n² iirc, so 4 switches = 16 variants to compile (and in practice waaay more, because UE has internal feature toggles too).
@@TechArtAid Thanks! So it I understand well, a level with instanced materials would take longer to load, but would be fine playing afterwards? Or when you talk about performance are you only refering to rendered pixels?
No, not at all. I mean the moment when you hit Apply in the material editor. In a bigger project it may take ages to Compile Shaders (as you may know from UE memes ;d). Having fewer switches (and materials in the project) reduces that wait time
For run time, switches don't have a cost. But search for 'draw calls' discussions. So its a balancing act
I was curious about this as well, thank you for asking this :D And thank you @TechArtAid for an awesome video and answers.
If you sample the same texture multiple times at different UVs per pixel, do you effectively save the performance from loading that texture into memory? or is it just as bad as 2 different samplers because it doesnt cache the way im thinking it does?
You save by having 1 less file to stream in from disk (file management, copying to VRAM). But I don't have an idea how significant that is (what range of latency)
Great video, as always.
If I have a static switch parameter. How does that compiles? It makes 2 versions of the shader, one for true and one for false and swaps them?
Yes. Actually many more than 2 (a sum of all used variants). It's "invisible" when it comes to instructions, but see the comment by Cedric about other problems with them
Thank you very much. 2 questions.
When you say less noisy or coherent UV, what exactly did you mean? Did you mean, for example, I have 1 weapon model and I have 1 texture but with 2 version of UV (So 2 textures but same result for that weapon )) ). One is autoUV and auto pack mess, other is less UV cut and packed by hand. So is it what you mean incoherent and coherent textures? Does it makes it unoptimal?
Second question is, I can't right HLSL code, is there any way to convert my shader blueprint and put it in that site to check my real cost?
1. I mostly meant adding some noise to UV coordinates in the shader. For example if you want to make an irregular magic portal, you'd probably add a noisy flow map to the UVs, right before sampling a texture. Which is usually fine, but you have to remember it reduces efficiency. Tiny auto packed islands and a lot of seams may actually also affect that, true!
2. I forgot to show that. At the top menu bar of material editor, there is a "Show HLSL code" entry. Unfortunately though, the code UE produces is soooo messy :< All functions are there, then it relies on the compiler to optimize the unused ones away
@@TechArtAid OK thank you very much )))
Hi Tech Art Aid! this might not be the place for it. Tho i have a hlsl shader, I want to import into unreal engine. I need help badly, as i have been trying for years with no progress. I hope you are willing to take a look into this HLSL shader code, and to tell me if this tech if possible to apply into Unreal Engine or in an similar way. please I do hope you can help me. greets!
Ah, sorry, I don't do freelance now :) I'd suggest breaking it into a series of smaller problems and discuss on forums or discords
@@TechArtAid ty for replying! I shall do this!
how do i check cycles on android GPUs ?
Oh, idk, haven't worked with mobile yet. Search for something like 'unreal android gpu profiling'. Cycles may be not available, but milliseconds could suffice. See my GPU Profiling series for intro to that domain
@@TechArtAid thanks
How do you know all these things?
A mix of Twitter/blogs, own experiments and AAA work recently, I guess ;D ratio like 10/30/60, for the last few years
Follow whom I follow, for a starter :) mobile.twitter.com/TechArtAid/following
watching this makes me wanna not use ue anymore 😆
Why? Optimization is fun :3 it's a puzzle game
@@TechArtAid haha yea but its like epic is making things difficult for us on purpose 😢 i wanted to start doing some game stuff until i realized ue5 has horrible performance 😆
This video is amazingly informative, thanks for sharing.
I have 2 questions:
1) If compiler doesn't wait for sampling to finish to do other stuff, does it mean that it start multiple samplers in parallel and the final cycles cost for the shader will be the longest sampling it tooks.
For example it started 3 samplings - 100 cycles, 150 cycles and 125 cycles. The longest sampling was 150 cycles then entire sampling of 3 textures is 150 cycles. Do I miss something?
Example from shader playground i.imgur.com/aTAV5M2.png
2) Where I can learn more about how much cycles texture sampling can actually cost?