Doubling the speed of my game's graphics [Voxel Devlog #18]

Douglas

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 чер 2024
Online demo: github.com/DouglasDwyer/octo-...
Additional voxel models: drive.google.com/drive/folder...
With some clever tricks, I've managed to halve the frame times in my voxel game engine! Join me as I explain three essential optimizations for voxel rendering. These optimizations include using DDA for traversal, bitwise masking to filter out potential intersections, and performing a low-resolution depth prepass. In addition, I talk about ray marching in general and discuss the other new features that I added to my engine this month.
Music used in the video:
Chris Doerksen - Perfect Parry
Corbyn Kites - Orbit
White Bat Audio - Athena
White Bat Audio - Chroma
Chris Doerksen - New Groove
Ігри

КОМЕНТАРІ • 141

@DouglasDwyer Місяць тому ⁺⁹
Want to learn how to write powerful, high-performance code? Then be sure to check out CodeCrafters using the link below, and help support my channel:
app.codecrafters.io/join?via=DouglasDwyer
They have one project which is completely free to complete during their beta, and you can begin any of their projects for free! Get 40% off if you upgrade to a paid account within three days.
@GabeRundlett Місяць тому ⁺⁶⁹
Great video as always! A couple of suggestions:
Regarding using DDA on multiple levels aka a "hierarchy"- HDDA, Ken Museth, implementation in OpenVDB. The basic idea is that the DDA process is pretty much restarted when the hierarchy level changes. An alternative (if you had a shallow tree) would be to have a stack of the DDA traversal state, one copy for each level. This alternative may be too register heavy (and might benefit from the stack being in shared mem instead).
Regarding the voxel memory layout, you show that first comes the 64-bit bitmask and then the data segment. This means that when you fetch a brick, you always load 8 bytes of bitmask data, as well as an additional 120 bytes of data, to fill the cache line. This means that 93% of your cache during traversal will be filled with these data segments!!! try to pack at least 16 bricks together so that you fill all 128 bytes of the aligned cache line with bitmasks!!
@andreirost4696 Місяць тому ⁺²
Eyooo hello
@oamnog Місяць тому ⁺³
Ohh I get it, so that the cache isn't wasted on data segments that may not even be used as the ray may skip the voxel brick, and also the bitmask data of the surrounding voxel bricks are loaded ahead of time to be used spontaneously, and reducing updating the cache frequently.
Determining the neighbouring 15 voxel brick bitmask data to store could be determined using 2x2x4, 1x2x8, 1x4x4 and 1x1x16 bounding boxes that can be offset and rotated based on the ray general direction and position. The ray's position in the bounding box would prob be in the far corner, and the shape of the bounding box (1x1x16, 2x2x4, etc) is determined whether the ray is moving closely axis aligned, or sharply in two axis, etc. You would prob need to take into account the relative hierarchical voxel size that the ray is currently at.
But that approach sound complicated, and you could prob just pack 64 bricks' bitmasks together into 4 128 bytes aligned cache line (but idk how cache works so i may be wrong).
@DouglasDwyer Місяць тому ⁺²⁷
Thanks for the suggestions! I'll be sure to check out the OpenVDB implementation.
As for cache lines, that's a very good point. I'll have to experiment with alternate data layouts in the future. But there's one nuance that I didn't mention - data segments are variable length. So if a brick has a single non-empty child, then the data segment will only be 4 bytes. But coming up with a way to pack the bitmasks (maybe I could store the bitmasks inline alongside each child pointer) might also help with cache coherency! That said, I don't really think that I'm memory bound right now :)
@pusab2064 Місяць тому
This is some crazy low level thinking i aspire to be like this
@JoeTaber Місяць тому ⁺¹
@@DouglasDwyer You can be limited by memory bandwidth separately from being limited by memory capacity. :)
@dleiferives Місяць тому ⁺³¹
It’s amazing how much you’ve grown as a programmer. Keep up the good work
@DouglasDwyer Місяць тому ⁺⁴
Thanks! I intend to do so!
@nitaki_ Місяць тому ⁺⁴⁶
This is a great video, but you should refrain from using rotating or moving gradients as a background for your explanations. The youtube compression does not like it at all
@franzusgutlus54 Місяць тому ⁺²
You are so right!
@DouglasDwyer Місяць тому ⁺⁷
Good suggestion. I was curious to see how it would turn out, but perhaps UA-cam's compression butchers it...
@danieles6684 7 днів тому
@@DouglasDwyerI actually was going to say I liked it!
@NathanNahrung Місяць тому ⁺⁷
Would you see further improvements by expanding on on the beam technique? For example, what if your initial low resolution image was based on the displays aspect ratio? For example, a 16:9 ratio screen would generate a 17x10 square grid of rays (rays at screen edges so grid x & y are +1 the screen aspect ratio) for the first low resolution image, the next image which would start rays further from the camera could have 16x more samples (a 68x40 square grid of rays) as this would force to the next brick layer, and then this pattern could continue until the final resolution image which is at the displays ultimate resolution (including any multiple samples per pixel). Aternativly you could determine the initial low resolution by working backwards from the displays ultimate resolution, rounding up fractions of pixles by overdrawing the field of view for the lesser resolution images, and perhaps going all the way down to a 16 pixel image as the initial low res image.
@frozein Місяць тому ⁺¹⁰
Great stuff, the bitmasking optimization was really clever. As for you hierarchical traversal question, it is definitely possible to avoid the ray-cube intersection test. The basic idea is at every step, you first descend as far down the tree until you reach a non-empty node. Then, perform your DDA step within that node, and if the ray exits the current tree node, ascend back up the tree until you are in bounds again. You do need to reinitialize DDA whenever you change levels though. I actually just implemented this for octree traversal in my engine and it works great!
@DouglasDwyer Місяць тому ⁺⁵
Thanks for watching! I do the same thing in my engine that you describe. What you call reinitializing DDA is what I'm referring to as a ray-cube intersection test, since you use the cube and ray position data to determine the side of closest hit :)
Edit: speaking of your engine, can't wait to see your next video!
@frozein Місяць тому ⁺²
@@DouglasDwyer Ah I see, makes sense. I avoid the ray-cube intersection by subtracting position within the node and multiplying by 2 when descending. I keep a stack of node positions to ascend. It would be slightly more complicated for a 64-tree like you have though.
@linksword7110 Місяць тому ⁺¹⁰
holy, i thought you said you were going to reduce quality for more episodes. But this is amazing
@DouglasDwyer Місяць тому ⁺³
I am not planning to reduce quality for discrete graphics cards. The lower quality will just be an option for the engine to run on iGPUs
@linksword7110 Місяць тому ⁺³
i meant on the videos :p
@DouglasDwyer Місяць тому ⁺⁸
@@linksword7110 Oh gotcha haha. I'm glad to hear you thought the quality is still good. This video only took 8 hours total to produce!
@BalintCsala Місяць тому ⁺⁵
What I do in my own system to solve the hierarchical issue is to first separate the ray origin into an integer and fractional part. This is mostly to combat precision errors (since you move up and down the scales a ton), instead of getting 23.999 as the hit position I get 24 as the integer part and -0.001 as the fractional one, this proved way less destructive for me.
Then what you need is a way to tell how to convert this position from one octree level to a different one.
Going down towards smaller cubes is easy, you always go down exactly 1 level. I do this the following way:
// The size of one voxel is half as big as before, so fractPos should be doubled
fractPos *= 2;
// If the fractPos is (1.5, 0.5) that means we should be in the child with coordinates (1, 0)
// This can be calculated with a simple step
cellOfset = step(1.0, fractPos);
// Adjust the integer position accordingly
integerPos = integerPos * 2 + ivec2(cellOffset)
// And return fractPos to be a in [0, 1]
fractPos -= cellOffset;
Going up is a bit harder since you might go up multiple levels, but it's still doable, if "levels" is how many levels you want to go up, then it's:
// Store the current int pos for later
oldIntegerPos = integerPos;
// This divides the position by 2^levels
integerPos >>= levels;
// Bit complicated, this calculates the relative offset between the previous voxel and the (0,0,0) coordinate
// of the ancestor we go to
fractPos += vec2(oldIntegerPos - (integerPos
@DouglasDwyer Місяць тому ⁺²
Thanks for the suggestion! I use similar math to go up and down trees in my engine as well. However, the Shadertoy you sent performs what I would call a ray-box intersection test rather than DDA. The DDA algorithm is characterized by the fact that you *don't* recompute side distances for all cardinal ray directions each step. Rather, you update a single side distance each step in the direction that you moved (see this page lodev.org/cgtutor/raycasting.html). The suggested shadertoy does `vec2 dists = -ro * invRd + max(invRd, 0.0)`, which I would call a full ray-box intersection test
@djordjepepic1656 Місяць тому ⁺⁸
You might want to check out Nvidia's paper 'Raytracing Sparse Voxel Database Structures on the GPU', they outline a hierarchical DDA approach for efficient empty space skipping that you might be able to implement in your engine. You should read the paper for a better explanation, but the basic idea is to do DDA for all levels of the hierarchy at the same time, going down a level on collisions and going up a level when leaving a 'chunk' of the current level.
@DouglasDwyer Місяць тому
Interesting - I'll be sure to check it out :)
@yiwmsh4393 25 днів тому
The mountain top looks so distant to me, down here at the start of the trail, so I'm very thankful you're posting the pictures you took up there.
@stormy_vox Місяць тому ⁺¹
your optimizations are really cool! i don’t fully understand the bitmask stuff bit it’s neat
@DanaTheLateBloomingFruitLoop 25 днів тому
Really cool stuff! The second optimization was genius!
@tommycard4569 Місяць тому ⁺²
I always look forward to your videos, keep up the great work!
@DouglasDwyer Місяць тому ⁺¹
Will do!
@phylliida Місяць тому
Ur videos are so good!! Always excited to see another one
@beholdergamedesign Місяць тому ⁺⁵
I kind of wonder if we need another name for ray marching determined by grid boundaries or bounding volume intersections vs by distance fields.
@procedupixel213 Місяць тому ⁺¹
"Grid traversal" algorithms is the name that I learned for some of these.
@BalintCsala Місяць тому ⁺⁴
Ray marching is just the process of stepping the ray forward by some amount instead of using analytical formulas, if you visit the wikipedia page it tells you the name of those two specific variations, sphere tracing for SDF-s and cube-assisted raymarching for voxel traversal, although I personally see that referred to more as "voxel tracing"
@stevegta6830 25 днів тому
I've always loved voxel engine graphics. Great work on this :)
@adricklynn8882 21 день тому
Amazing work! Can't wait to see more
@247flashgames Місяць тому ⁺¹
You are insane, and I love the quality of your videos & algorithms! You explain your strategies well, and I can’t wait to see more videos! ❤
@DouglasDwyer Місяць тому
Thank you :)
@empireempire3545 Місяць тому ⁺²
Okay i'm drunk but hear me out: beam-tree
You initially cast a single square 'macro-beam' from the camera and advance it forward as usual until you hit first voxel - this is the root node.
The moment you hit the voxel, you split the macro-beam into sub-beams - ideally as few as possible, these are the 1st gen children of the root.
The sub beam which hit the intiial voxel gets handled more or less as you do atm with low and high resolution beams.
The rest of the camera projection proceeds onwards - but at worst case you still need only 4 macro beams to surround the beam (X) with the voxel which you hit (consider how to divide the area of M's):
MMMMM
MMXMM
MMMMM
MMMMM
Looking from the side, the macro beams are truncated square-based pyramids, subdividing into smaller truncated square-based pyramids.
The idea of course is to reduce the number of beam-steps you have to make. Smaller number of beams means smaller number of steps you would have to calculate - and even if the calculation is slightly more complicated (square vs point) then the reduction of 2 orders of magnitude should more than make up for it.
@platinumsun4632 28 днів тому
Dude I really love the beam optimization. Looks so cool. Like out of an old 2.5d game.
@c0pi Місяць тому ⁺¹
I recognize the place of the demo!! I had a pizza in that place. Nice!
@MatejVancoCG Місяць тому
Really cool. Thanks for such detailed explanation!
@simongido565 Місяць тому ⁺²
I do something similar in my engine. I call it compressed and normal cells. Compressed cell is the one that contains subgrid of selected size with the same voxels. The first advantage is that instead of storing for example 16x16x16 voxels I store only one that represents all of them if they are the same. Then about raymarching. I raymarch compressed grid as if it was normal voxel grid. If i get inside cell that contains only one voxel which means it is compressed, I just return color of the voxel that cell holds. If I get inside cell which contains 16x16x16 voxels it means it was not compressed, I start new raymarching that starts where compressed cell raymarching ended and I raymarch voxel subgrid which has dimensions 16x16x16, then if not hit happend I just continue raymarching compressed cells again and so on. It increased performance a lot and memory wise it is also great. I am thinking about implementing multiple levels but for now I am okay with just two levels.
@simongido565 Місяць тому ⁺¹
Maybe I could draw a picture if you wanted 😆 it might make more sense then.
@sti3167 Місяць тому
kind of like palette compression?
@simongido565 Місяць тому ⁺¹
@@sti3167 more like grid compression? I do not know what exactly is pallete compression :D. Imagine grid 256x256x256 and each voxel has size 1.0. So to compress it and traverse faster I take some number by which is 256 divisible. Lets say 8. So I create new grid with dimensions 32x32x32 ( 256 / 8 ) and voxel size 1.0 * 8. Now I go through original grid by checking 8x8x8 subgrids. If that subgrid has uniform voxel I copy in top compressed grid just single voxel instead all of them. If they are not uniform I copy all of the voxels. When raymarching I raymarch compressed grid as if it was regular voxel grid. But once I get inside voxel (cell) which has more voxels inside, I switch to traversing that subgrid with dimensions 8x8x8.
@simongido565 Місяць тому ⁺¹
So in the end it is grid of smaller grids. Which I create from denser grid.
@sti3167 Місяць тому
@@simongido565 Sounds like you are generating and using LoDs/mips for voxels on multilevel grid(or nested grid, sometimes called brick-map) depending on distance/required details. I would love to see some code of how you generate it/store/traverse, but its up to you : )
P.S Afaik palette compression is about compressing data that belongs to voxels instead of storing it in voxels themselves - for example unique colors or normals etc in a chunk of voxels is being written to a lookup-table(or something like that) and then every voxel indexes into this table instead of directly defining it, that way you can minimize memory overheads using less bits, even without doing quantization/lossy compression.
This might improve performance too, because that way you can ensure you are not memory bound and there are less cache misses because it can fit into shared memory, cache lines etc
@hexadeque1101 Місяць тому ⁺¹
Here is my DDA on an octree implementation if it is helpful:
I have basically a stack data structure holding the current node I'm looking at and its parents so that I can walk up the tree. The stack is initialized to just hold the root octree node.
At the beginning of each DDA step in the loop, the current position (rayOrigin + rayDirection*distance) is guaranteed to be within the current octree node on the top of the stack, but the current node is not guaranteed to be a leaf node (the current position is inside one of its children).
while (rayDepth < RENDER_DISTANCE) {
Get the bottom most node containing the current position (the current node is now guaranteed to be a leaf node containing the current position)
Check if the current node is a filled voxel. If so, the ray hit something
Step the ray with DDA (the current position is no longer within the bounds of the current node)
Walk up the nodes in the stack until the current position is within bounds of the current node (sets up for the first part of the loop for the next iteration)
}
Hopefully that made sense. If not, my actual code is in the replies.
@hexadeque1101 Місяць тому
struct OctreeNode {
uint data; // Basically an enum using the following constants
uint[8] children;
};
// Different materials are not implemented yet so full and empty are the only two states a voxel can be
const uint EMPTY = 0;
const uint FULL = 1;
const uint PARTIAL = 2; // For parent nodes who have some full (or partial) children and some empty children
struct Ray {
bool hit;
uvec3 position;
ivec3 normal;
float depth;
};
layout(std430, binding = 1) buffer World {
OctreeNode world[];
};
...
Ray castRay(vec3 rayOrigin, vec3 rayDirection) {
...
// Ray box intersection on the size of the octree in case the ray starts outside the octree is omitted
ray.position = uvec3(floor(rayOrigin));
ray.depth = 0;
vec3 stepSize = vec3(
sqrt(1 + (rayDirection.y*rayDirection.y + rayDirection.z*rayDirection.z) / (rayDirection.x*rayDirection.x)),
sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.z*rayDirection.z) / (rayDirection.y*rayDirection.y)),
sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.y*rayDirection.y) / (rayDirection.z*rayDirection.z))
);
ivec3 direction = ivec3(sign(rayDirection));
vec3 rayLength = mix(rayOrigin - ray.position, 1 - rayOrigin + ray.position, step(0.0, rayDirection)) * stepSize;
// The stack is stored as an array indexed by currentLevel which is incremented every time we descend down the octree
// It stores uints which are indices into the world array
uint nodes[OCTREE_LEVELS + 1];
uint currentLevel = OCTREE_LEVELS; // Level 0 is an individual voxel
nodes[currentLevel] = 0; // Root node is always index 0
uvec3 currentNodePosition = uvec3(0);
while (ray.depth < RENDER_DISTANCE) {
while (world[nodes[currentLevel]].data == PARTIAL) {
currentLevel--;
uint voxelWidth = uint(pow(2, currentLevel));
uvec3 center = currentNodePosition + uvec3(voxelWidth);
uvec3 subnode = uvec3(greaterThanEqual(ray.position, center));
currentNodePosition += subnode * voxelWidth;
nodes[currentLevel] = world[nodes[currentLevel + 1]].children[subnode.x + subnode.y * 2 + subnode.z * 4];
}
if (world[nodes[currentLevel]].data == FULL) {
ray.hit = true;
return ray;
}
// DDA step + store hit info for if we hit something
// Possible optimization: instead of taking one unit long steps, take steps proportional to the size of the current node. i.e. if the current node is empty and 4 voxels wide, step 4x the distance. I could not get that working at the moment, though, as it brings extra complications. :/
if (rayLength.x < rayLength.y && rayLength.x < rayLength.z) {
ray.position.x += direction.x;
ray.depth = rayLength.x;
rayLength.x += stepSize.x;
ray.normal = ivec3(1.0, 0.0, 0.0) * -ivec3(sign(rayDirection));
} else if (rayLength.y < rayLength.z) {
ray.position.y += direction.y;
ray.depth = rayLength.y;
rayLength.y += stepSize.y;
ray.normal = ivec3(0.0, 1.0, 0.0) * -ivec3(sign(rayDirection));
} else {
ray.position.z += direction.z;
ray.depth = rayLength.z;
rayLength.z += stepSize.z;
ray.normal = ivec3(0.0, 0.0, 1.0) * -ivec3(sign(rayDirection));
}
// Bounds check
if (ray.position.x < 0 || ray.position.y < 0 || ray.position.z < 0 || ray.position.x >= worldSize || ray.position.y >= worldSize || ray.position.z >= worldSize) {
return ray;
}
// I was lazy and went all the way up to the root node every time and left only going up as far as necessary for "later"
currentLevel = OCTREE_LEVELS;
nodes[currentLevel] = 0;
currentNodePosition = uvec3(0);
}
return ray;
}
@owendavies8227 Місяць тому
Wow, the beam optimization is absolutely genius.
@RyDawgE Місяць тому
This is AWESOME!!!! Very inspirational
@WlliamBrown Місяць тому
This looks really awesome! I stumbled across your videos yesterday, and have viewed most of them. I have also been researching voxels with marching cubes, but it destroys precise environment manipulation. Have you considered applying marching cubes only within each of your 8x8x8 voxels?
@DouglasDwyer Місяць тому
Thanks for watching! I prefer the look of smooth cubic voxels, so I'm not planning to integrate marching cubes at this time (it would also make ray traversal more complicated)
@CSPciccionsstibilepippons 28 днів тому ⁺¹
I tried the native demo today: with my Ryzen 3500u integrated GPU it runs at around 50 ms with default settings, but with 2x downscale it runs under 20 ms even with maximum lod bias and I can't even notice the difference in resolution 🎉🎉🎉
@0memega Місяць тому
Before this, I didn't know you could even use DDA for raytracing. As I only used it for raycasting. Amazing video.
@seth111yta1 8 днів тому
As voxel engines advance and relative voxel size shrinks, is there an advantage to just rendering voxels as a point cloud with like face-camera colored squares or something? great work BTW and thank you for actually making browser demos
@shadow_blader192 Місяць тому ⁺¹
Amazing video
@DouglasDwyer Місяць тому ⁺¹
Thank you!
@HerrDoktorWeberMD Місяць тому ⁺¹
Doing god's work, stuff I wish I had the spare time to achieve. I just pray you don't pull a John Lin by dropping the most gorgeous voxel world renderer anyone has ever seen and then vanishing.
@DouglasDwyer Місяць тому ⁺⁴
Well, I would love to "pull a John Lin" by dropping the most gorgeous voxel world renderer. But I don't intend to disappear!
@moonshot3159 Місяць тому ⁺¹
holey moley this guy's good! Have you also considered similar optimizations that Vercidium (yt channel) did with his voxel engine? Or are those a different beast all together.
@DouglasDwyer Місяць тому ⁺¹
Thanks for watching! I think that most of Vericidium's optimizations are very good, but they mostly apply to rasterizing voxels. Since I'm ray marching, they don't apply.
@Z_Z.t Місяць тому
for DDA in different sized pow2 boxes just replace multiplication with bit shifts and integer arithmetics
@bytesandbikes Місяць тому
Very nice! 😀
@victorwidell9751 19 днів тому
The low resolution pre rendering pass was interesting. Would it help to do it multiple times? I imagine it could be done with each level of an octree.
@matanon8454 Місяць тому
Its amazing! Is water flow simulation something you consider to add at some point? Would love to dig some rivers and see water flowing 😍
@DouglasDwyer Місяць тому
I haven't studied fluid simulation yet, so unfortunately I probably won't get to it for a while. But I would love to have water like John Lin's!
@Lantalia Місяць тому
Neat! Have you done anything with multiple, unaligned, grids, so you can, for instance, have a moving ship that is itself a voxel grid? It seems like your rendering solution may make that a lot easier than I would normally expect
@DouglasDwyer Місяць тому
While the renderer supports unaligned objects, I haven't re-added voxel entities on the game logic side. The functionality is there - hoping to take advantage of it in a future episode!
@JosephGleespen Місяць тому
@@DouglasDwyer How do you ray trace the unaligned objects, or rather shoot rays through/past the unaligned obejcts? Do you trace them in separate passes and the composite them into a single image? I'm mostly trying to conceptualize how one would perform DDA while encountering things that are off grid.
@guillaumemagniadas2472 Місяць тому
When you're speaking of sparse tree, are you speaking of sparse voxel octree ? I wrote an alternative of DDA algorithm that would compute a different step depending of the current level of depth in the SVO
@spidermankey1398 5 днів тому ⁺¹
You can do bresenhem's algorithm instead of DDA which is more efficient
@DouglasDwyer 5 днів тому ⁺¹
Bresenham's algorithm is not conservative - it may miss voxels on the line that rays pass through. As such, using Bresenham using would result in gaps and artifacts appearing on objects.
@EndroEndro Місяць тому ⁺¹
Sadly i'm getting error panicked at 618 in web-d2eddb7ca2dff036 js file (Could not load GPU adapter) and 602:21 in opera other browsers and exe the same issue.
@DouglasDwyer Місяць тому ⁺¹
Thanks for trying the demo! It sounds like your computer doesn't support Vulkan or WebGPU. I will add an error message in the future for unsupported devices.
@Wizarth Місяць тому
You mention that you ensure the beams never go far enough into the geometry hierarchy to miss a voxel that is smaller than the grid/pixel size.
Do you adjust the depth based on distance from camera? I.E. does it go deeper into the hierarchy when closer to the camera?
@DouglasDwyer Місяць тому ⁺²
Yes, that's correct. The maximum distance that a ray can travel depends upon the size of the voxel (i.e. hierarchy depth) and the distance between adjacent rays. Adjacent rays "spread out" as they progress with a perspective camera, so that needs to be accounted for.
@Z_Z.t Місяць тому
10:48 isnt using simple rasterization with depth gonna give better depth values? (im aware about greed meshing)
@Octal_Covers Місяць тому
Which DDA algorithm are you using? Different algorithms have difference performance characteristics so it might be worthwhile to look at some other options.
As for stepping through various sized voxels, would it be possible to multiply the amount you add to the ray position? Since it's a line, it should be the same relative positions
@DouglasDwyer Місяць тому ⁺¹
I am using a similar technique to the one described on this page: lodev.org/cgtutor/raycasting.html
The problem is that when you step between grid levels, the distance from the ray to the current X/Y/Z planes changes in a non-uniform way. You need to update that distance somehow, and as far as I can tell updating the distance and recomputing it are pretty similar computationally. So every time I switch grid levels, I need to recompute the DDA variables. This is what I refer to as performing a box-intersection test, since it's the same thing mathematically.
@Octal_Covers Місяць тому
@@DouglasDwyer you could attempt updating it each time the voxel size changes, and if it's not as performant as box intersection tests then you could just stick to box intersection tests.
The algorithm I'm using for my voxel game is the Fast Voxel Traversal Algorithm, which doesn't use as much trig functions
@UnofficialFoneE Місяць тому ⁺¹
Yes, this video made me "bricked" 👀
@owendavies8227 Місяць тому
I guess for preserving DDA calculations between scales, if you're just going up or down a single level, the numbers would be 4 times greater or less with no other changes, and that could be performed just with bitshifting and nothing else, which would be pretty fast.
@owendavies8227 Місяць тому
I guess you could keep track of how many levels you are stepping between and do a bitshift of double the number of difference in levels of hierarchy, but it would probably be more computationally efficient just to do a double bitshift each time so you don't have to keep track of anything.
@DouglasDwyer Місяць тому
Unfortunately, I don't quite think that this works. With DDA, you store the distance along the ray to each plane of the current voxel. But if you move up/down tree levels, the planes may shift by non-uniform amounts. There are perhaps ways to update the plane positions based upon the initial and final voxel, but I don't know of any that are more efficient than just reinitializing DDA from scratch (which is equivalent to a ray-box intersection test).
@owendavies8227 Місяць тому
@@DouglasDwyer I am willing to believe I made some kind of mistake.
@owendavies8227 Місяць тому
@@DouglasDwyer I don't know if this would necessarily be faster, but you could, if nothing else, theoretically update the plane positions in linear time. If the axes are aligned to the voxel grid (which could be done with a matrix multiplication for the camera and ray just once in the beginning), it would mostly just be addition, subtraction and bitshifts to move the planes around.
@ayyship Місяць тому
That’s amazing progress
@UltimatePerfection 5 днів тому
You should try getting the voxel resolution higher, with each voxel being 10cm or less.
@steluste Місяць тому
Are you using any engine/framework such as Monogame or it's all from scratch? Also do you have a public repository of the project?
@DouglasDwyer Місяць тому ⁺¹
The project is written entirely from scratch in Rust and uses WebGPU as the graphics API. The project is not open-source at present, but many components of it (like the event system are networking system) are open-source on my GitHub!
@amaryllis0 Місяць тому
Game looks great, i gotta second that the rotating backgrounds make me feel nauseous tho
@monstercameron 17 годин тому
is this signed distance fields? 3:19
@slent2410 11 днів тому
So do you play on releasing this to the public once its completed or in a useable state?
@DouglasDwyer 10 днів тому
There's already a demo playable online! But yes, I hope to turn this into an actual game someday.
@thomasmcbride2287 Місяць тому ⁺³
THREE DIFFERENT OPTIMIZATIONS??? YOU’RE A WIZARD!
@lumiey Місяць тому
There is an LOD 'leaking'? artifact when I fly a bit high up
@DouglasDwyer Місяць тому ⁺¹
Yep, the LOD generation isn't perfect. The LOD leaking shouldn't happen as much with procedurally generated terrain and other solid models. It happens because the church has a lot of thin surfaces and so the engine isn't sure which surface to put on top when generating the higher LODs.
@Midazc Місяць тому ⁺¹
Did you slow down the audio track? Sounds much more natural on 1.25x
@DouglasDwyer Місяць тому ⁺¹
Speaking slowly and clearly is important for public speaking, so that's what I aim to do. But you are more than welcome to enjoy the video at 1.25x if you'd like!
@Deniil2000 Місяць тому
you mentioned u32 and u64 in your video. are you writing this in Rust?
@DouglasDwyer Місяць тому
The project is written entirely in Rust and uses WebGPU as the graphics API.
@Harald723 Місяць тому
YES MORE...
@paxcoder Місяць тому
So down to 60ms? But it would have to be ⌊1000/60⌋ = 16ms to run at 60fps right?
@DouglasDwyer Місяць тому ⁺¹
Yep! As noted in the video, the benchmarks presented were on a very bad GPU (Intel UHD 750H). On any discrete card, the engine runs at a buttery-smooth 60 FPS - for example, the scenes in the video run at 4 ms (or 250 FPS) on my NVIDIA 1660 TI. It's possible to make the game run in realtime on the Intel GPU as well by lowering the resolution. But for benchmarking I wanted the frame times to be as big as possible, so I chose a worst case.
@paxcoder Місяць тому
@@DouglasDwyer Too bad a discrete card is a requirement, as I'm afraid I only use integrated cards, but that is a whole lot of voxels there, so I guess it makes sense.
@DouglasDwyer Місяць тому ⁺¹
Discrete card is only a requirement for 1080p. If you play the game at a lower resolution, you can definitely hit 60 FPS on an iGPU. But don't take my word for it - try the demo and evaluate for yourself! There is an option in the settings menu to downscale the image resolution for better performance :)
@xezvcikgames1191 Місяць тому
@@DouglasDwyer can you please show in pseudo-code how you do ensure low res wont step too far 11:06 ? i didnt quite get it, also what it would be like to trace more coarse lods in pre-pass, for example on CPU side? still could avoid a lot of traversal iterations. How do you even make sure its done in conservative manner? wont it require supercover dda or smth, hard to imagine tbh
@lumiey Місяць тому
It runs super smooth on my PC, but I can't see how fast it actually runs because I can't disable vsync in online demo
@lumiey Місяць тому
My GTX 1060 can handle 1500x1500 resolution at 60fps (2.25 mega pixels (1080p is 2.0736MP), there is no performance difference between web/native
@StylishHobo Місяць тому
Rather than your custom data structure. Have you considered using a k-d tree instead?
@shadow_blader192 Місяць тому ⁺²
1:55 isn't it called raycasting?
@sjoerdev Місяць тому ⁺²
raymarching is raycasting in small steps
@shadow_blader192 Місяць тому
@@sjoerdev raymarching is raycasting based on minimum distance to objects?
@sjoerdev Місяць тому ⁺²
@@shadow_blader192 thats sphere tracing which is a form of raymarching
@shadow_blader192 Місяць тому ⁺³
@@sjoerdev oh those names
@DouglasDwyer Місяць тому ⁺²
No idea, everyone seems to have their own names for these algorithms lol
@R.Daneel Місяць тому
Wow. UA-cam compression isn't doing your grey background any favours!
I can't tell from the video. Have to fallen down the SIMD rabbit-hole yet?
@jamesoneill4607 Місяць тому
Fewer rays
@beholdergamedesign Місяць тому
A voxel brick has 8 64 bit occupancy masks?
@DouglasDwyer Місяць тому ⁺³
Since a voxel brick has 4^3 = 64 voxels, with one bit per voxel, each brick has 1 64-bit occupancy mask.
@mohammadazad8350 Місяць тому
Premier!
@mohammadazad8350 Місяць тому
So apparently commenting "First" in French earned me a heart... my studying is paying off!
@timmygilbert4102 Місяць тому ⁺¹
@@mohammadazad8350je suis là pour tester ton français 👀 voyons voir si ça paye tant que ça 😂
@mohammadazad8350 Місяць тому
@@timmygilbert4102 "I am ??? for test your (familiar) French. Let's see see if that pays ??? that that"
By interpolating, we get:
"I am going to test your French, let's see if it paid off".
I can't believe a UA-cam comment just gave an exam on something I do for fun :).
@mohammadazad8350 Місяць тому
*gave me
Also, I genuinely had to resist the urge to look up "tant" in the dictionary.
@timmygilbert4102 Місяць тому
@@mohammadazad8350 good job 👍😁
@gsestream Місяць тому ⁺¹
try to render your engine nightmare scenario, ie the worst case geometry. multiplication is always cheap operation. division might not be, but will usually be also cheap on modern hardware. no op required is better than have to do op tho. also if you use sphere collision volume overlapping, then you need only one test per cell, not 6 for all sides of a cube cell plane boundaries. but you are testing coordinate plane boundaries for the DDA universally I think. so instead of storing stuff in the data structure, why not use the 64-bit as a pointer to the voxel data, which could be anything, not limited to the 64-bits. pointer to data structure is very fast. some optimizations are not worth it and make the logic a mess. order of magnitude optimizations should be the focus always. oh you try to compress the data using bit storage. if you make the assumption of large empty areas, then you can compress the data with RLE ones and zeros (voxel no voxel) for traces. so you are not using BVH to traverse top-down the voxel tree, that would compress the data also. bvh top-down traversal will work like DDA but processes the tree in the occupancy order, and stops if there is no stuff voxels in the tree leaf. usually you can just do the old style draw distance capping to limit amount of voxels drawn, even if you have billions. the old trick. even when doing true 3d traces. ie only render closest to a draw distance, even for lighting bounce traces. less is more. to get an idea of actual performance, try 4K resolution and see how badly it chucks or not. making an off-line renderer to test any features at high res and lighting rendering gives directions. you know what to do. maybe use an off-line pre-calc sdf for the ray marches, at least for static parts of the scene. maybe separately test dynamic meshes. instead of complex hit detection, you would just raster all voxels. as triangles. difference is not significant in general, not in the order of magnitude margin range. try rendering a voxel cloud, which is spongy ultra sparse voxel mesh. sdf can be updated on fly locally where voxel mesh edits happen.

Наступне

Автоматичне відтворення

I Optimised My Game Engine Up To 12000 FPS