Rust multi-threading code review

Поділитися
Вставка
  • Опубліковано 3 січ 2025
  • my 3d cellular automata simulation written in rust with multi threading was sub-optimal. Using benchmarks with criterion and the help from my community, I learnt what the bottlenecks of this application are.
    Today I share the journey I went through to optimize my multi-threading code!
    Let's talk about the code... ;)
    original video 3D cellular automata video:
    • 3D Cellular Automata -...
    3D Cellular automata project links:
    Source code: github.com/Tan...
    IsseW gpu implementation: github.com/Iss...
    Dstaatz atomics implementation: github.com/dst...
    BENCHMARKS I made for this video:
    github.com/Tan...
    My discord group:
    / discord
    Want to support me?
    Patreon: / tantandev
    Monero: 43Ktj1Bd4Nkaj4fdx6nPvBZkJewcPjxPB9nafnepM7SdGtcU6rhpxyLiV9w3k92rE1UqHTr4BNqe2ScsK1eEENvZDC3W1ur
    Credits:
    Music Prototype by Wintergatan Build Tracks
    Music Valentine by Wintergatan
    Music The Rocket by Wintergatan
    Music Non-Euclidian Geometry by Punch deck
    Music Brazilian Street Fight by Punch deck
    Music See The Light by Punch deck
    Wintergatan link: wintergatan.net/
    License can be found on website
    Punch deck link: / punch-deck
    Punch deck license: creativecommon...
    #rust #multithreading #review

КОМЕНТАРІ • 244

  • @quantumdeveloper2733
    @quantumdeveloper2733 2 роки тому +276

    For the boolean array you could also try putting it all into a single 32 bit integer, where each bit represents one value.
    The check would then be something like (rule & 1

    • @Tantandev
      @Tantandev  2 роки тому +80

      Solid idea! Faster or not... we still save some bytes haha

    • @anderdrache8504
      @anderdrache8504 2 роки тому +54

      The compiler can't do memory layout optimizations like that. As a rule of thumb: it can only reorder struct fields, nothing else.

    • @zyansheep
      @zyansheep 2 роки тому +31

      Or you could use bitvec :)

    • @nilstrieb
      @nilstrieb 2 роки тому +49

      It will be slower, not faster, since now there are several instructions to get the value. While the array will get slightly smaller, it doesn't really matter, since it's quite small anyways, so you won't get any benefit from the smaller cache usage.
      But it would be interesting to benchmark it

    • @quantumdeveloper2733
      @quantumdeveloper2733 2 роки тому +67

      ​@@nilstrieb
      Trying to prove you wrong I tried to verify my hypothesises by googling:
      Originally I thought that the processor couldn't adress anything smaller than a word and therefor would have to do bitshifts to adress a byte anyways, but that doesn't seem to be the case for x86 anymore.
      And I also thought that my variant would allow the compiler to do more SIMD optimization because there was no SIMD memory lookup, but in avx2 there is an instruction for that as well.
      So I guess on modern hardware you are right.
      I also made a benchmark to be sure, and indeed on my avx2-capable desktop pc I get a 15% performance decrease.
      But on the raspberry pi 3b, which has an ARM processor, I get a 30% performance increase.
      It might be interesting to check more modern ARM processors as well, but I don't own one.

  • @ItsMeChillTyme
    @ItsMeChillTyme 2 роки тому +314

    Very valuable video. Some youtubers and bloggers keep on insisting that efficiency doesn't matter cause "new compooter phast" and have a considerable influence on new learners who get stressed and look to validate if what they are learning is worth it or not. And, at that point if they come across people like those, it could completely misdirect them from becoming better engineers. While what has been done here to compare is not a usual application, it goes to demonstrate how much efficiency matters and how all those little nanoseconds and microseconds add up quick.

    • @thoriqadillah7780
      @thoriqadillah7780 2 роки тому +22

      I bet the people you meant are javascript developer. No offense lol

    • @ItsMeChillTyme
      @ItsMeChillTyme 2 роки тому +11

      @@thoriqadillah7780 Mostly but I've seen it come from new developers for everything, even C++. Though JS and Python have it most since they are beginner friendly.

    • @nagoranerides3150
      @nagoranerides3150 2 роки тому +8

      @@thoriqadillah7780 I first saw the idea that efficiency isn't important because "new compooter phast" in the early '80s. The truth is that efficiency of coding is sometimes more important than efficiency of execution. I write 90% of my code in Perl, because I write a lot of small utility scripts for solving specific issues at work and quite often the code is single-use. Shaving a day off development and testing time is far more useful than even shaving an hour off run-time. If I have a task that will take weeks to run, then I write it in something fast so the return outweighs the time lost in coding in a more formal language. And something that requires realtime doesn't leave you much option.

    • @thoriqadillah7780
      @thoriqadillah7780 2 роки тому +12

      @@nagoranerides3150 but we're talking about game dev here. And the "javascript dev" is about frontend in web, or even mobile. I think the game dev part the need for speed is critical because we're dealing with customer pc. Remember "but can it run crysis?" meme? Yeah i remember that. And the frontend web dev i think speed is somewhat important, because we're dealing with user internet speed and data

    • @dealloc
      @dealloc 2 роки тому +2

      You have to set constraints on what hardware you aim to support, otherwise we'd never move forward. For example, Valve has done a tremendous job at creating an engine that is versatile across low and high-end hardware-and Source 2 is highly targeting VR applications, so speed and varied hardware support is of utmost importance-but it is also very limited in its capabilities when compared to other modern engines that may not be suitable for low-end hardware.
      Of course, that doesn't mean you should max out everything and only aim for high-end hardware, since that can also have drawbacks in and of itself.

  • @Leonardo-nf5jc
    @Leonardo-nf5jc 2 роки тому +19

    This video reminded me the mantra "make it work, make it right, make it fast".
    As always interesting and entertaining content, thank you.

    • @Leonardo-nf5jc
      @Leonardo-nf5jc 2 роки тому +1

      @JM Coulon you may have not implemented error handling for example, it work but it's not the right way to code

  • @demetriuslewis6750
    @demetriuslewis6750 2 роки тому +38

    This is an amazing video and I'm super glad you opened up about the issues that were made and how it was fixed. Super healthy for the community and understanding programming in general.

  • @andrewgr144
    @andrewgr144 Рік тому +2

    Mad props for publicly sharing your mistakes for the rest of us to learn from. It speaks to your depth of character. Thank you!

  • @jesperhustad
    @jesperhustad 2 роки тому +10

    This is rare great content as most UA-camrs only cover the fronted. Thanks for this great resource on low level performant code.

  • @danielmitre
    @danielmitre 2 роки тому +16

    You can count the number of neighboors with convolution. With 2D it would mean to slide a 3x3 matrix [[1, 1, 1], [1, 0, 1], [1, 1, 1]] in every position the cell matrix, multiplying the values in the same position, sum them and put in the center position. There are very fast algorithms to run this on CPU and even faster on GPU

    • @brdrnda3805
      @brdrnda3805 2 роки тому +3

      Can you please elaborate a little bit more? I really would like to understand.

    • @RottenFishbone
      @RottenFishbone 2 роки тому +5

      @@brdrnda3805 Convolution is a pretty common technique for running some kind of matrix-defined algorithm in parallel on a large dataset, usually images. Each value of the matrix is basically an offset of a target cell holding a value to multiply by. So, given that the matrix is 3x3, the top row of the matrix would multiply [above-left, above, above-right] by 1's and sum them. The middle row would be [left, self, right], the value of 'self' is ignored, because the matrix is 0 at that position. The end result is you add 1 for each active neighbour i.e. sum of neighbours.
      You do this on each cell to build a new 2D array with the value of each cell being the sum of its neighbours. The benefit of convolution is that no cell's calculation depends on the calculation of any other cell. As such, you can have a kernel executing on each cell in parallel, potentially ALL at the same time. This is trivial to do in GPU API like CUDA or Vulkan.

  • @elliotf3029
    @elliotf3029 2 роки тому +13

    You've quickly become one of my favorite Rust centered channels. Well done!

  • @pigworts2
    @pigworts2 2 роки тому +78

    The nested vector is not as messy as you made it look.
    For initializing you can do: vec![vec![vec![0u8; SIZE]; SIZE]; SIZE].
    For indexing, you can do it just like the array: v[0][4][3].
    The flat vector is definitely a better idea tho.

    • @biglexica7339
      @biglexica7339 2 роки тому +18

      even if it was messy, just write a macro

    • @shambhav9534
      @shambhav9534 2 роки тому +5

      Yep, a flat vector would be better. I think that's how it is intended to be used.

    • @erikwg3814
      @erikwg3814 Рік тому +7

      Ye, especially the .get(0).unwrap().get(4).unwrap() and so on felt quite dishonest. If you aren't doing any checking anyway there is absolutely no point in using the get function

  • @OfficialMrRATpHace
    @OfficialMrRATpHace 2 роки тому +11

    Just found your channel about a week ago. Cube World was my jam, full support for this game of yours! Keep up the awesome work, you're keeping the hopes of many fans alive.

  • @niomeda
    @niomeda 2 роки тому +9

    That's why i love coding. No matter how good you think you've made something, it's always room for improvement and learning new things.

  • @PenguinjitsuX
    @PenguinjitsuX 2 роки тому +9

    It's just so interesting to see you explain and talk about different concepts! I love these videos.

  • @atalocke
    @atalocke 2 роки тому +12

    0:01 my typo will live forever!
    Great video! I read through the whole repo after the last one. Wish I could have contributed but I came after all the easy work was taken 😂. I’d love to see more videos like this! ❤

  • @trash_aly
    @trash_aly 2 роки тому +2

    This is my favourite video of yours! Diving into code, showing alternatives, and explaining exactly why one is better for a certain goal. That's such a great way to actually learn and level up as a programmer!

  • @dabbopabblo
    @dabbopabblo 2 роки тому +3

    Hash maps only see you a performance increase in the use case you need to do a fast search where there is no way of deriving the values location in memory from a previously accessed one, over the alternative of iterating through a an entire array just to find the location of a single key/value pair

  • @soupnoodles
    @soupnoodles Рік тому +1

    To actually see an influencer not get big headed and reject all possible advice by saying they don't need it, is really good. Usually you'd have so many people tell you how good you are that you'd stop caring enough for those who might help you improve. I think it's great that you're open to feedback and took it in the right way. Great stuff, easily one of my favourite channels

  • @superblaubeere27
    @superblaubeere27 2 роки тому +16

    5:30 You could also just use a 27-bit bitset stored in a 32-bit value. Since moving 27 booleans (27 bytes) around is not efficient and such large values can easily lead to cache misses which may take microseconds to be resolved.

    • @DanDeebster
      @DanDeebster 2 роки тому +8

      Yup, all you need to do it decide whether to write something yourself or use bitvec or bit-vec or bitvector or bv or bitvec_simd or bitsvec or fixedbitset or smallbitvec or awint or bitvec-rs or indexed_bitvec or ...

    • @user-dh8oi2mk4f
      @user-dh8oi2mk4f 2 роки тому

      @@DanDeebster just use an u32?

  • @hermannpaschulke1583
    @hermannpaschulke1583 2 роки тому +141

    Hmm, I wonder how this would look with Rayon. For testing, I did a simple mandelbrot program, and then simply plugged in Rayon, and it scaled really well. Maybe a project for next weekend :D

    • @joshxwho
      @joshxwho 2 роки тому +5

      I love rayon, for my CLI apps I build it makes multiprocessing super easy.

    • @m.sierra5258
      @m.sierra5258 2 роки тому +5

      Mandelbrot is embarrassingly parallel though, it's a whole different story if you have a border exchange like in cellular automata. That's where the real difficulty of efficient multithreading begins :)

    • @SbF6H
      @SbF6H 2 роки тому

      Rayon? That shit which is for toy programs? Sure.

    • @adamhenriksson6007
      @adamhenriksson6007 2 роки тому

      Rayon feels kinda like using go func in go. So simple and essy

    • @Andrew90046zero
      @Andrew90046zero 2 роки тому

      I've been trying to use Rayon to multithread another type of simulation I'm making. And I used it on one part to see if there was any improvement. But It seems like it only reduced the performance of the simulation! I think there is still a lot I have to figure out...

  • @michaelganzer3684
    @michaelganzer3684 2 роки тому +1

    Let's summarize what I've seen: You had an idea and shared your approach on your YT channel. You managed to write functional code and you managed to deliver a result. Others (in your team, bc c'mon, we're all happy coder buddys) felt the urge of improving your well intended path to success with a more robust and reliable solution. I'm impressed to see you entertaining us with your learning, thank you for that! Because that story would otherwise be a fantastic and totally realistic representation of why 80% of all software projects fail: skipping non-functional requirements, skipping testing, skipping error handling, skipping the path from knowledge to understanding, skipping quality at all, keeping the total bus factor below 1. So congrats, you just got a new subscriber who's speaking about such topics on community events. :)

  • @valshaped
    @valshaped Рік тому +1

    Glad you're taking advantage of the Wintergatan creator license

  • @DrIngo1980
    @DrIngo1980 2 роки тому +20

    This was a great video. Great at explaining how one should go about optimizations and performance tuning. A bit light on how to actual measure performance, but hey, that could be another video 🙂 Anyway, the message you wanted to convey is pretty clear and that's cool. It's also really cool how you handle feedback it seems. I suck at handling feedback, especially feedback that criticizes my code. I need to become a lot better at that.... Again, thanks for a great video!

  • @JoeTaber
    @JoeTaber 2 роки тому +4

    Nice video! You could probably multithread the full calculation if you use a "double buffer" system: *two* copies of the simulation data, one is the "previous" (read-only from all threads) the other is "next" (each thread mutably borrows a section). For each step, swap the buffers, split_mut the write buffer, and fan out to N threads that each read from the read buffer and writes cell states to the write buffer for its borrowed section.

  • @blitzventium7237
    @blitzventium7237 2 роки тому +1

    Really loved that you took time to explain where you could have done better. Not a lot of youtube content where they dissect a cool project to look for optimizations. Great job:)

  • @camh2316
    @camh2316 2 роки тому +3

    Microbenchmarking is useful for measuring if your optimisations actually helped, the other important tool to use is a profiler. Profiling your application will tell you where most time is being spent. Profiling your application for the first time almost always finds some things that could be more efficient.

  • @micowata
    @micowata 2 роки тому +1

    A good story with a proper lesson, you're a great storyteller Tantan! Thank you for the quality entertainment.

  • @Skeffles
    @Skeffles 2 роки тому +2

    Fantastic to see how much can be optimised! It seems like you've learnt some great tips here and I can't wait to see how it applies to your voxel game.

  • @Filly309
    @Filly309 2 роки тому +1

    great video! general when optimizing code I look for in order: 1) Can we do less work (e.g. iterations) 2) Can we avoid doing work (e.g. caching) 3) Can we spread the workload out (e.g. concurrency)

  • @CountJamesMcGuigan
    @CountJamesMcGuigan 2 роки тому

    This code review style video is a really insightful way of understanding code. More please.

  • @KilgoreTroutAsf
    @KilgoreTroutAsf 2 роки тому +4

    6:11 2.05us = 2050ns , not 205000 ns, so it is a 7.5x improvement, which is still pretty worthy

    • @JackEnneking
      @JackEnneking 10 місяців тому

      ​@@rishabhkhemu He's correct on hashmaps at 3:10 : 35 us = 35,000 ns.

  • @GPandzik
    @GPandzik 2 роки тому +1

    Along the lines of your "Measure before you optimize" insight, Gene Amdahl (one of the guys who created the IBM mainframe Assembler language in the 1960s & 70s for the IBM S/360 & S/370 mainframes) expressed Amdahl's Law (paraphrased): "The maximum value of a speedup (1/(ratio of the runtimes before & after optimizing)) from optimizing a part of a system is bounded by the amount of time a system spends in that part of the system."
    (For a speedup, if you reduce your runtime from ten seconds (t1) to five seconds (t2), the speedup is t1/t2 = 10/5, for a speedup of 2.0. Reducing from ten seconds to one second has a speedup of 10/1 = 10x.)
    What Amdahl's Law shows us is that the maximum speedup from optimizing part of a system is bounded by the amount of time a system spends in that part. If you have a choice between optimizing, say, disk IO performance by half or CPU performance by 90%, and your system spends 80 seconds out of 100 in IO, but only 10 seconds out of 100 in CPU processing, with another 10 seconds in "Miscellaneous processing," it actually makes far more sense to optimize the IO subsystem, since a 50% optimization would result in a speedup of around 1.67x (100 / (half of 80 (40) for new IO) + 10 (CPU) + 10 (Misc. time, like memory or something)) -> 100 / 60 -> 1.66666... < 1.67x) vs. a speedup of 1.10 (100 / ( 80 (IO) + 1 (new CPU, having saved 90% of it's 10 seconds) + 10 (Misc. time)) -> 100 / 91 -> 1.09890... < 1.10x). I.e., optimizing to save only half of your IO time in your IO-heavy application gives you a 1.67x speedup vs. 1.10x for saving 90% of your CPU time that only consumed 10% of the original time. Hell, eliminating your CPU time entirely would only give you a less-than 1.11x speedup, which is still significantly less than your "lesser" IO optimization.
    These are only things you'd know -- where your program is spending its time, and doing what in which inefficient ways -- if you've properly benchmarked or otherwise measured your application's performance characteristics. Using an insight like Amdahl's Law also gives you a place to start to look for optimization opportunities, regardless of if you never calculate actual speedups -- if your program spends 90% of its time in one subroutine, then you know that it makes little sense to start anywhere else in your search for optimization targets, since you could eliminate every other part of the program and only get a

  • @m3thid
    @m3thid 2 роки тому +2

    I was just about to make the same mistakes as you when I saw this video! I was able to take this and apply it on my project. Took it from 18s to 450ms!

  • @aliancemd
    @aliancemd 2 роки тому +3

    The code doesn't look like "spaghetti" any more. That "optimized" method for example, looks quite beautiful. It's clear you've got a lot of experience now, even though we always keep learning how to do some things better.

  • @lassipulkkinen273
    @lassipulkkinen273 2 роки тому +1

    Definitely a nitpick, but the "slow" code at 2:40 uses HashMap::iter which is very different from indexing the elements individually in order, which is what you appear to have benchmarked (this is what was actually being done in the simulation code (where HashMap::iter (i.e. one-by-one iteration though cells in arbitrary order) definitely wouldn't have been applicable), which is why I said that this is a nitpick).
    I haven't done benchmarks to confirm this, but I would expect (map: HashMap).iter() to be about as fast as iterating through a slice of map.capacity()* elements, mem::size_of::() in size each. (The std hash map implementation appears to use a flat array of (K, V), so compared to a Vec the need to store the keys makes it a bit bigger and thus less cache-efficient, and that's about it.)
    Of course none of this applies to one-by-one indexing, where hashing and perhaps the chaotic memory access pattern are the main sources of overhead.
    *Actually, the number of buckets. Apparently these are not the same thing...

  • @GRAYgauss
    @GRAYgauss 2 роки тому +2

    "What gets measured, gets managed." - Some body building quote I have applied to most aspects of my life.

  • @constantinhirsch7200
    @constantinhirsch7200 2 роки тому +1

    The simple way of parallelizing is having two data buffers: read from one, write to the other. No mutexes or atomics needed, just par_iter on the mutable array. For next generation, swap the buffers.

  • @simonfarre4907
    @simonfarre4907 2 роки тому +1

    So many great lessons you've learned. I can *definitely* co-sign the last one at 11:00. This should be taught at Uni. Unfortunately their teaching is generally bad.

  • @silversyntax
    @silversyntax 2 роки тому

    I don't know what's about in this video, but this rejuvenates my love for programming. It made me remember why I feel in love in programming in the first place. Thanks!

  • @thegame4027
    @thegame4027 2 роки тому

    One of the first things my first boss taught me about development was, "Make it work, make it fast, make it beautiful." It's just by far the best approach. Since then, my development workflow has been hacking together something that just does the job, then I optimize it until it does the job well. Then I throw it all away (refactor it) into an architecture that fits the solution.

  • @m4rt_
    @m4rt_ 2 роки тому +5

    0:00 Eyyy It's me!

  • @JoeTaber
    @JoeTaber 2 роки тому +1

    Speaking of addressing and layout strategies, I wonder if you could encode the data on a hilbert curve. The "lindel" crate has functions that can efficiently convert between points in the cartesian space to an index along the curve. This might be nice for cache coherency.

  • @LIRAY
    @LIRAY 2 роки тому +1

    I came across you while I was looking for an OpenGL manual.
    You're a very nice guy

  • @SuboptimalEng
    @SuboptimalEng 2 роки тому +3

    Good thing I never have to worry about writing the optimal solution 😎

  • @Neph0
    @Neph0 2 роки тому +3

    Love your videos Tantan, I laughed at the Vec, that was great

    • @Tantandev
      @Tantandev  2 роки тому +2

      Truly beautiful code

  • @bytefu
    @bytefu 2 роки тому

    Haven't watched the whole video yet, but I have to address an issue. If you want to create a multidimensional Vec and initialize it with a specific default value, the code at 2:00 and 2:05 is unnecessary bloated. First, there is the "vec" macro that works exactly like array initialization. Second, you can index Vecs the same you index arrays. Given these facts, the code can be written like this:
    fn main() {
    const SIZE: usize = 10;
    let states: Vec = vec![vec![vec![0; SIZE]; SIZE]; SIZE];
    let cell = states[0][4][3];
    println!("{}", cell);
    }
    which is pretty much the same as with arrays.

  • @johanrojassoderman5590
    @johanrojassoderman5590 6 місяців тому

    Another optimization (which you kinda already touched on) is to avoid looping through the neighbours to count them and instead do that using constant offsets and adding them all together in one line. Saves branching and writes.

  • @skejeton
    @skejeton 2 роки тому +1

    You can optimize it even more if you consider that it's safe to run threads on cells that are 1 cell apart (No colliding neighbors) without any locks

  • @DunceHouse
    @DunceHouse 2 роки тому

    I really enjoyed this breakdown. Also nice way of looking at the code with editing.

  • @kennethbeal
    @kennethbeal 2 роки тому

    Wisdom at 3:25: "Oh it's empty! I will get some water instead, then." Much better than too many cups!

  • @leddoo
    @leddoo 2 роки тому +4

    that was quick, nice vid!

    • @Tantandev
      @Tantandev  2 роки тому +1

      My unofficial mentor. Thank you so much for the help!

    • @leddoo
      @leddoo 2 роки тому +1

      @@Tantandev any time, my man!

    • @leddoo
      @leddoo 2 роки тому +1

      looking forward to the new voxel stuff 👌

  • @jamieforster9986
    @jamieforster9986 2 роки тому +8

    i love these kinds of programming videos! they're so informative and entertaining

  • @Nathan00at78Uuiu
    @Nathan00at78Uuiu 2 роки тому

    duuuuuuuuude this video is awesome and made so many connections for me from my computer science education. super cool content and quality!!!

  • @qtluna7917
    @qtluna7917 2 роки тому +4

    So, funny thing, both the UA-cam algorithm and me both thought this was about the video game Rust. I thought weird topic, but okay, I'll bite.
    (It doesn't help that the thing in the thumbnail looks like the heightmap for a video game island)

  • @distrologic2925
    @distrologic2925 2 роки тому +1

    A nested vector is no problem if you just put it in a wrapper type with a simplified Api... like get_cell(x, y, z).unwrap() or expect_cell(x, y, z) if you want to drop the unwrap

  • @Spiritusp
    @Spiritusp 2 роки тому

    Regarding how to store the data: I would use a hashmap of blocks (say 16x16x16).
    It would easily parallelize as well, as each block could be a task. And we could avoid atomics IMO, perhaps by presaving boundary data, which is an order of magnitude smaller than the actual data.

  • @monad_tcp
    @monad_tcp 2 роки тому +1

    All I was thinking when you used hashmap and arrays and double passes was. Convolution, use convolution kernels ! instant speed and threading.

  • @freezerain
    @freezerain 2 роки тому

    Wow changing wrapper around data can make code run faster!? Mindblowing

  • @kurt7020
    @kurt7020 2 роки тому +4

    Rust has "zero-cost" abstractions so algorithms don't matter, right? Lol. Remember my fellow rustaceans, they're only zero *additional* cost as compared to writing out the *same algorithm* they desugar into. Algorithms and data structures matter - A LOT! Computers are faster, but data is bigger.

  • @mfyoutube5003
    @mfyoutube5003 2 роки тому +1

    tantan, 2.05us is not 205000 ns, it's 2050 ns
    1us = 10^-6s and 1ns = 10^-9s
    so 2.05us = 2.05·10^-6s
    = 2.05·10^3·10^-9s = 2050·10^9s
    = 2050ns
    I never comment on UA-cam but this triggered me sorry lol
    I love your videos

    • @mfyoutube5003
      @mfyoutube5003 2 роки тому +1

      ok so this is why you finish the video before commenting friends.

  • @diadetediotedio6918
    @diadetediotedio6918 2 роки тому

    Consider using a hashmap not to store the voxels itself, but to store chunks of voxels, this way you can have expansible chunks of area without dealing with all the performance costs of using the hashmap for every single voxel

  • @multiHappyHacker
    @multiHappyHacker 2 роки тому

    waiting for C++ '23 and '26 flat_map and flat_set until then we can probably use a map to map the key to an index into a vector or flat array, so you still have a flat array to iterate if you choose and constant time accessing.

  • @wadecodez
    @wadecodez 2 роки тому

    lol classic example of pre-optimization going horribly wrong! Good thing you got it and listened to people trying to help

  • @nexovec
    @nexovec Рік тому

    I can tell you from looking at that that the cached vectors insanity is slower than just returning the values if you don't do something crazy.

  • @phillipneal8194
    @phillipneal8194 2 роки тому +2

    Excellent !! Thank you. As Knuth said: "Premature Optimization is the work of the devil"

  • @mysticrustacean4065
    @mysticrustacean4065 2 роки тому +1

    We can instantiate the Vec as:
    let mut states: Vec = vec![vec![vec![0u32; x]; y]; z]

  • @VelocityTheory
    @VelocityTheory 2 роки тому

    2:00. Why instantiate an object to 0 on O(n), when there is calloc()?

  • @saxpy
    @saxpy Рік тому

    you could store cells in a 3d matrix as a chunk, and have a hash map against those chunks

  • @nowave7
    @nowave7 2 роки тому +1

    Hold on... 2.05μs is actually only 2050ns, so that would make the bool array not even ten times faster than the rule enum.
    EDIT: Ah, okay you've noticed the mistake yourself. Yeah, this is what you get when you start typing a comment before you've even seen just a couple of seconds of video after the questionable statement. My bad.

  • @giantnoah
    @giantnoah Рік тому

    Instead of a hashmap you could use an array set to the maximum size you're planning to use, and then make a getter/setter that clamps your values to the bounds set in the program. Should certainly be faster to iterate over than a hashmap, and depending on the hash function could likely be less memory used.

  • @marksmod
    @marksmod 2 роки тому +2

    3:30 the bane of every coffee addict. Get a cup and it basically is empty before you even sit down and start programming

  • @crustydev5561
    @crustydev5561 2 роки тому

    Found your channel yesterday and somehow it is perfect for my ADHD

  • @1over137
    @1over137 2 роки тому

    You are storing an int FFS. Just make the array as large as your potential maximum, alloc it at start up and move on. A single megabyte gives you 100*100*100 of 8 bit ints.
    We have done this in the real world, stock exchange gateway risk checker, processing 20,000 messages per second on each of up to 20 bi-directional sessions. Wire to Wire latency of less than 20 microseconds, sometimes less than 1 microsecond. (Non commodity hardware).
    We had a multi-dimensional array of financial instruments containing market data (pointer). We just allocated double the number of possible symbols and allocated a 15Gb Array on start up.
    If you want performance, use static memory allocation or face doom.

    • @1over137
      @1over137 2 роки тому

      Also, remember, (while this might not work in rust).... you can allocate a flat single dimensional array of data and then create as many indexes as you want to access those pointers in whichever way is performant. Obviously this works best when you know the bounds of that data first though.

  • @NetherFX
    @NetherFX 2 роки тому

    Are you planning on adding examples in the repo? :-)

  • @aliengarden
    @aliengarden 2 роки тому

    What is your greenscreen? It looks well made and I'm looking for one!

  • @zitronenfresser7089
    @zitronenfresser7089 2 роки тому

    Could you make a video about your vscode setup?

  • @soberhippie
    @soberhippie 2 роки тому

    You could have created a 27 element Rule array to get a rule that applies to a particular number of neighbours

  • @az-qf9ht
    @az-qf9ht 2 роки тому +1

    Arrays continue to blow my mind

  • @simonfarre4907
    @simonfarre4907 2 роки тому +3

    Uh, your math is a bit off. 2.05us is not 205000 ns, it's 2050ns.
    Edit: oh you edited it 😄

  • @officiallyaninja
    @officiallyaninja Рік тому

    Why not just use a nested vector and create a struct or something wrapping it to deal with getting the values at particular indices?

  • @seneketh
    @seneketh 2 роки тому

    Some awesome conclusions everyone can learn from :D . Cool!

  • @SadSmileGames
    @SadSmileGames Рік тому

    @5:50 In Algorithm engineering, you're usually supposed to compare the worst case scenario of each algorithm.

  • @ОлегБожок-ф6и
    @ОлегБожок-ф6и 2 роки тому

    It would be great if create blog with articles about optimisations

  • @MrLeonnius
    @MrLeonnius 2 роки тому

    if the bounds are limited and if the maximum bound a user can select does not exceed an absurd amount of memory then you could still do the array method, just always allocate the maximum right?

  • @leonardovee
    @leonardovee 2 роки тому

    dude, your videos are crazy good!

  • @ProjectPhysX
    @ProjectPhysX 2 роки тому +1

    Of course a flat array is much faster than some lazy data structure. I remember so many IT people telling me to use these stupid data structures, because a raw pointer scares the shit out of them... I'd have to wait for months to get simulation results then.
    At this point, I only do GPU parallelization, it's ~200x faster than even CPU multithreading on a flat array.
    Also, you should probably not use slow atomics here, as you have a small grid size and plenty of memory. Use 2 copies of the array and read/write back and forth between them.

  • @zoeythebee1779
    @zoeythebee1779 2 роки тому +3

    Look ma! I'm in a niche internet video!

  • @ajnart_
    @ajnart_ 2 роки тому

    Hello simple question but can't you just use one single array allocated with like 1x10^N slots and access the element like elem[203] for elem[2][0][3] ? It would be more memory consuming but not time consuming, right ?

  • @zhaadd
    @zhaadd 2 роки тому

    i feel at home with these coding videos that i dont understand but know it will make sense after sometime

  • @Palundrium
    @Palundrium 2 роки тому +1

    That Wintergatan background music... :)

    • @Tantandev
      @Tantandev  2 роки тому +1

      many people comment when I use his music. Happy to see other Wintergatan fans :)

  • @martinbeltrandiaz
    @martinbeltrandiaz 2 роки тому

    I haven't tried Rust yet, but this videos is interesting as hell

  • @mikael808
    @mikael808 2 роки тому +1

    Loved the video, thank you! Also can't wait for the voxel game video, sounds super exciting! :D

  • @recreationalplutonium
    @recreationalplutonium Рік тому

    why a bool array and not just count bits in one u32 or whatever?

  • @eboatwright_
    @eboatwright_ 2 роки тому +3

    1:59 Actually no. You can use the same syntax as an array with vec![vec![0; 50]; 50] :D

    • @L4Vo5
      @L4Vo5 2 роки тому

      wouldn't that not work with SIZE instead of 50 though, since it's not a constant?

    • @L4Vo5
      @L4Vo5 2 роки тому

      ok nope it does work. I was completely mistaken

    • @eboatwright_
      @eboatwright_ 2 роки тому

      @@L4Vo5 Yeah, as long as SIZE is a const variable then it works

    • @L4Vo5
      @L4Vo5 2 роки тому

      @@eboatwright_ I tried and it worked even with a non-constant variable

    • @eboatwright_
      @eboatwright_ 2 роки тому

      @@L4Vo5 oh, awesome! :)

  • @dave_m98
    @dave_m98 2 роки тому

    am i the only one who is watching this with 0 rust knowledge and still find it entertaining and interesting

  • @zitronenfresser7089
    @zitronenfresser7089 2 роки тому

    Could you maeby do a rust tutorial? I think it would be interesting

  • @jonxslays
    @jonxslays 2 роки тому

    lmao the music when the multi threading code comes up.

  • @WhiterockFTP
    @WhiterockFTP 2 роки тому +2

    erm 2.05 μs is 2050 ns and not 205000 ns… 6:12

  • @GabrielxC
    @GabrielxC 8 місяців тому +1

    In the beginning of the video looking the rust syntax is the reason why C is better, why make things harder for nothing why?

    • @Wyvernnnn
      @Wyvernnnn 7 місяців тому +1

      Because it's not for nothing and provides invaluable features that C could never have?

  • @teenageoperator7246
    @teenageoperator7246 2 роки тому

    oh man, Wintergatan as BGM? I’m in.

  • @pugo7925
    @pugo7925 2 роки тому

    Come for the code, stay for the Wintergatan background music, then rewatch to actually concentrate on the topic xD

  • @imaginaoYouTubesoquecomarrobas
    @imaginaoYouTubesoquecomarrobas 2 роки тому

    Great video as always!