Andrew Kelley Practical Data Oriented Design (DoD)

Поділитися
Вставка
  • Опубліковано 22 сер 2024
  • Copyright: Belongs to Handmade Seattle (https ://vimeo.com/649009599). I'm not the owner of the video and hold no copyright. And the video is not monetized.
    In this video Andrew Kelley (creator of Zig programming language) explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput.
    At the end he presents a case study where DoD principles are used in Zig compiler for faster compilation.
    References:
    - CppCon 2014: Mike Acton "Data-Oriented Design and C++": • CppCon 2014: Mike Acto...
    - Handmade Seattle: handmade-seattle.com/
    - Richard Fabian, 'Data-Oriented Design': dataorienteddesign.com/dodbook/
    - IT Hare, 'Infographics: Operation Costs in CPU Clock Cycles': ithare.com/infographics-operation-costs-in-cpu-clock-cycles/
    - The Brain Dump, 'Handles are the better pointers': floooh.github.io/2018/06/17/handles-vs-pointers.html

КОМЕНТАРІ • 261

  • @mika2666
    @mika2666 Місяць тому +119

    Just a note for anyone who looked at the CPU cache slide: yes the L1 and L2 numbers are 8 times the size they actually are because lscpu adds all your caches from all your cores together.

    • @ron5948
      @ron5948 29 днів тому

      @@mika2666 api compatibility on cli level was linux unixes worst mistske

    • @gfasterOS
      @gfasterOS 28 днів тому +4

      There's also an -c flag that gives you cache information specifically (including per core)

    • @thewhitefalcon8539
      @thewhitefalcon8539 25 днів тому +1

      Only 8 cores in 2024?

    • @johndowson1852
      @johndowson1852 23 дні тому +3

      ​@@thewhitefalcon8539there are basically no laptop x86 chips with more than 8 cores.

    • @thewhitefalcon8539
      @thewhitefalcon8539 23 дні тому

      @@johndowson1852 for a laptop yeah. My tower has 24.

  • @anotherelvis
    @anotherelvis Місяць тому +96

    The original talk is from 2021.. Thank you for reuploading.

  • @daxramdac7194
    @daxramdac7194 Місяць тому +83

    I love how that talk by Mike Acton was actually a famous talk regarding data oriented programming, so much so that Andrew threw on the red hawaiian shirt lol. That talk by Mike was also the first time I sat and heard anything about this data oriented programming stuff. One thing though that needs to be pointed out is how GOOD Andrew is at this communicating technical subject matter. This presentation was about as clear an intro to this subject as you can get, 👍👍

  • @TJChallstrom916-512
    @TJChallstrom916-512 Місяць тому +44

    This talk is why an applied computer architecture course is so useful at uni. Like at the time you think "wtf I'll never look at nasm again" and most are correct, but the lessons it teaches you make you better 10 years down the line.

    • @d1cn
      @d1cn Місяць тому

      Can you recommend which one to choose?

    • @mr3745
      @mr3745 29 днів тому +5

      A broad CS/CE education doesn't teach you to code, it teaches you to think. Basic computer architecture + complexity theory completely changes the way you look at problems forever.

  • @JonitoFischer
    @JonitoFischer Місяць тому +25

    I loved he mentioned: ”C, Zig, Nim"... Those languages are awesome

    • @bobweiram6321
      @bobweiram6321 Місяць тому +7

      Ada is better! Type sizes are in bits.

    • @kellymoses8566
      @kellymoses8566 Місяць тому +5

      @@bobweiram6321 Ada would be more popular if the compilers hadn't been so expensive.

    • @samuraijosh1595
      @samuraijosh1595 Місяць тому

      @@kellymoses8566 what do you mean?

    • @JohnDoe-np7do
      @JohnDoe-np7do 7 днів тому +1

      Nim sucks tho. But the first 2 are goatz

  • @xcoder1122
    @xcoder1122 Місяць тому +76

    When storing out-of-band data in a hashtable, keep in mind that hash access is WAY slower than pointer access.
    First, hashtables are only O(1) in an ideal world, in reality no hashtable implementation is always O(1).
    Second, O(1) does not say anything about speed. It only says that the speed does not depend on the number of elements, but it can still be terribly slow. O(log2n) can be much faster than O(1), it just won't stay faster if you keep adding more and more items, because O(log2n) keeps getting slower while O(1) doesn't, so O(1) will take over at some point, but what if the amount of data you have in the worst case is way below that point? Because that's often the case. For small amounts of data, sorted arrays often give much better results than hashtables.
    Finally, a hashtable requires at least one memory access per lookup, so whatever cost you have when accessing a pointer, you have the same cost when doing a lookup in a hashtable, just to end up with a pointer, so you have to do another lookup, or to end up with an index, so you have to do two more lookups. And then there's the cost of hashing the key, which you don't have with direct pointers or sorted lists.

    • @tiagocerqueira9459
      @tiagocerqueira9459 Місяць тому +19

      The point is storing out-of-band, not the data structure to store. Nevertheless, hash function is just math and math it's fast, you're overstating the performance costs and assuming the dataset is small which may not be true. As usual, humans are very bad at predicting performance, benchmark first

    • @xcoder1122
      @xcoder1122 Місяць тому +35

      @@tiagocerqueira9459 Actually I'm speaking from experience. I once needed a lookup table in a kernel module and I chose a hashtable, because O(1) would ensure that I get pretty consistent lookup times, which is quite important in kernel code. Years later, just out of curiosity, I wanted to know how much slower the code would have been using a sorted list and a binary search instead. So I wrote a couple of unit tests, that would fill either a hashatble (same implementation as the one used in the kernel) or a simple sorted list (own implementation) with real world data (data was randomly chosen but from real world data sets), perform plenty of lookups and time the results. The conclusion was that the list was faster every time.
      Knowing that the list is O(log2(n)), I calculated approx. at how many elements the hashtable would win and I came up with a value a bit above 500 entries. And indeed, when I increased the sample size above 500, the hashtable became equally fast and at around 550 entries, the hashtable would finally consistently win against the list (only if it stayed collision free but that's another issue). There's just one catch: Due to technical limitations, it was impossible for the dataset to ever grow beyond 256 entries. So there was no real world scenario where the hashtable would have ever beaten the sorted list. On top of that, to stay (almost) collision free under all real world scenarios and never experience a degredation in performance, the hashtable had to be at least 50% larger than the data kept within, so not only was it slower, it also would waste way more memory and scatter the data way further apart in memory.
      You incorrectly assume hashing is just math but that is wrong, of course you also must read the data you want to hash from memory and that data may not be fixed size and it may not be well aligned. And a good hash algorithm needs to perform more than just a few operations per byte of key data to achieve a good hash distribution. Sure, you can choose a piss poor algorithm and it will be very fast but then you will end up with tons of collisions and no matter how you handle collisions, there's always a slow down involved (either when searching for hits or when verifying that there s not hit) and often also additional memory consumption.
      If you want some real world data, just write the same OO code once in C++ and once in Objective-C and benchmark. In C++ methods are called by fetching pointers from vtables, so all calls are indirect but they are just indexed table lookups at runtime. In Objective-C functions are called by name and method tables are hashtables. So every time you call a method, the method name is hashed and a hashtable lookup is performed to fetch the function pointer. The performance difference is huge; it's still huge, even though the Obj-C runtime has an internal cache and will not really perform a hashtable lookup for the last ten (IIRC) methods called (it would be way worse without that cache; at least calling the same methods over and over again in a loop is dramatically sped up that way).
      It's not like anything in this video wouldn't be common knowledge for 50 years and it's not like other programming languages and runtimes wouldn't have tried that in the past. And in the past, machines had way less memory, memory access was even slower than today (as CPUs had little to no cache), yet there is a reason why we ended up with the languages and runtime that we use today and the reason was not to waste more memory and have slower code for no benefit. It's not like today's programming languages are modeled according to how CPUs work, in fact it's the other way round, today's CPUs are modeled to the needs of existing code and designed around the question "What can we do to make that code run faster".

    • @tiagocerqueira9459
      @tiagocerqueira9459 Місяць тому +13

      @@xcoder1122 Well that's a lot of theory. In practice, I benchmarked the HashMap and BTreeMap from 10 to 1 million of Monsters in intervals of power 10 with a 1000 random accesses and here's the result: by the size of 100 Monsters the performance is evens, from that point onwards the HashMap remains constant and the BTreeMap keeps growing as expected. The threshold size 100 doesn't seem as much as you implied. But as I said, this is still very fast for both and almost meaningless in the majority of cases, I'd go with HashMap for a general purpose and scales effortlessly.

    • @xcoder1122
      @xcoder1122 Місяць тому

      @@tiagocerqueira9459 "I benchmarked the HashMap and BTreeMap" and these are two very specific implementation with tons of optimizations but also assumption about the data and not necessarily easy on CPU or memory (e.g. they were probably not optimized with embedded systems in mind). Neither one was available to my kernel code. And performance depends the data and how likely that data is causing collisions. But you are missing the actual point. The actual point is, how much slower is HashMap vs storing the data in band. Because storing data out of band to save 4 KB is a ridiculous on systems that have 16 GB of RAM and cannot really make use of that RAM mots of the time to begin with. In development you always make trade offs between speed and memory and wasting 4 KB, even wasting 4 MB is fine, if that gives way more speed. The argument of the video was, that by saving memory, you will even gain speed. And in general there is some truth to that statement but when you replace direct access with hashtables, there are only very limited specific cases where this holds true for real. It's not like game developers would not try that first before deciding to rather waste more memory instead and use direct access when this gives consistently better frame rates.

    • @David_Raab
      @David_Raab Місяць тому +3

      I have seen many people argue like you. Thinking a HashMap is slow, and key-generation would be slow, so they stick to searching through an Array in either O(n) time or adding stuff to an array with insertion sort O(N) or sorting which takes O(n * log n) time and than doing binary search with O(log n) time, just to avoid O(1) lookup access because obviously O(1) must be bad performance. I cannot express the amont of annoyingness I get whenever i read crap like this.
      Yes it doesn't say about the actual cost and also could be O(100), but usually we store many things in a data-structure so picking a data-structure that scales better for like 1-10 items and goes worse for more than 10 elements is just silly. That's also why we use measurement like O(1), btw. before that generalization people usually calculated the real costs of every operation. But this still didn't make picking algorithms and data-structure easier. That's why Computer Science today just speaks of O(1) and it's good to have it this way.
      Saying that for small N another data-structure could be better also has no point. If your N are so small and you have like 10 items in it, then this data-structure will probably never be part of any kind of critical performance code section. A HashMap usually already becomes faster below the N < 100 mark.
      It's very simple, if you need to lookup something by a key, then use a HashMap, otherwise don't use it. Most of the time when i have seen slow code is because of that, people wanna lookup something by a key and they scan an array and do excessive sorting of an array, thinking they are smart. No they aren't. Just use a HashMap, Key-generation is not slow and constant lookup-time most of the time is really really really good. No kind of crappy array implementation people come up with does outperforme this.

  • @josephlabs
    @josephlabs Місяць тому +26

    Moral of the story is to segregate data in memory. Makes it easier to read and hit cache and uniform data wastes less bytes for padding.

    • @sacredgeometry
      @sacredgeometry Місяць тому +3

      The moral of the story is its easy to optimise toy examples.

    • @yokunjon
      @yokunjon Місяць тому +10

      @@sacredgeometry lol, you are the most dunning kruger guy I've ever seen. I recall you from another reply, grow up buddy

    • @sacredgeometry
      @sacredgeometry Місяць тому

      @@yokunjon I have no idea who you are but you have given me absolutely no reason to care what you think or to believe your judgment is any good so ... ok.

    • @user-qw9yf6zs9t
      @user-qw9yf6zs9t 26 днів тому +5

      ​@@sacredgeometryid understand the tone of your reply if the examples werent obviously to teach the concepts? do you expect a full blown project shown here??? sorry if im being rude but i genuinely dont understand

    • @MaddieM4
      @MaddieM4 24 дні тому +10

      ​@@sacredgeometry if the actual Zig compiler is considered a toy example, I'm a little terrified of the upcoming generation of programming interview questions.

  • @dog4ik
    @dog4ik Місяць тому +88

    Good talk, I will use all the strategies in my js project.

    • @ErikBongers
      @ErikBongers Місяць тому +22

      Agreed . Implementing it in COBOL now.

    • @IndellableHatesHandles
      @IndellableHatesHandles Місяць тому +16

      Definitely going to be very useful for my Python project. Andrew is a legend

    • @Michallote
      @Michallote 20 днів тому +1

      I don't think you are going to get too many benefits in python. Because everything already is a dictionary after all. The memory model in python exactly abstracts away this part of the code. Even if you try to do it like Andrew, and add type hints. Or whatever, you wont get any benefits because the python interpreter has dynamic allocation of memory. That thing you said was a bool can be concerted any second to an array and python wont mind or crash. So the memory juggling will still happen...

    • @SimonBuchanNz
      @SimonBuchanNz 17 днів тому +6

      You're joking, but yes, you can apply this in JS. High performance JavaScript uses a lot of typed arrays to control memory locality, and avoids allocation.

    • @theairaccumulator7144
      @theairaccumulator7144 13 днів тому +1

      @@SimonBuchanNz high performance JavaScript? WASM and native binaries in Node are way faster than anything you can do in pure JavaScript.

  • @xcoder1122
    @xcoder1122 Місяць тому +20

    In C you often store a bool by re-using one bit of an u64 value where not all 64 bits are ever needed. E.g. you may have a u64 size value, because sizes can be more than 2^32, yet you know in advance, that sizes will never get anywhere close to even 2^63, so you can easily use the highest bit of the u64 to store something else, like a boolean value. Very often you can even use the top 4, 8 or even 16 bits of a 64 bit value to store other information.

  • @DMWatchesYoutube
    @DMWatchesYoutube Місяць тому +27

    I had the one hour a day rule too, and that's why I'm nocturnal, because my grandparents went to sleep at 9pm

  • @dantesp7557
    @dantesp7557 Місяць тому +6

    Absolute GOLD of the talk, thanks for publishing it

  • @soviut303
    @soviut303 Місяць тому +9

    What's interesting is the out-of-band booleans is something I use pretty frequently in databases when building views based on a table with a boolean. One view contains all that are true, the other contains all that are false; pre-discarding. I guess that's (maybe?) why it's called data oriented design.

  • @rt1517
    @rt1517 14 днів тому +4

    In my opinion, this approach has an impact on code simplicity and maintainability.
    On one side you have a structure that contains what you are interested in.
    On the other side, well, it is more complicated. And I would even say more bug prone et harder to debug.
    Also, he focuses only on the size of the data, while the way you access the data is also very important.
    For example, it is supposed to be way faster to visit all the elements of an array than to visit all elements of a linked list.
    Because with the array, the memory access are very close to each other so you stay in the cache and the processor can even predict what memory address will be accessed next.
    With a linked list, you are accessing all over the place so most of the time the next item won't be in the cache.
    Don't get me wrong, it is very nice that Andrew is trying to make the compiler fast. And there are a lot of slow compiler out there.
    But I am not sure that many of us mortals need that level of optimization.

    • @Nors2Ka
      @Nors2Ka 9 днів тому

      I would say Andrew misunderstood the meaning of DoD and what he showcased here is straight up optimization - a step most mortals do not have to take as their code is 1000x slower for other reasons entirely, i.e. using pointers from a linked list to refer to individually heal allocated items.
      There is also this funny thing where he optimized the most irrelevant part of the compiler, LLVM will gobble 98% of the compile time regardless (unless Zig can use it's own fast backends for debug builds).
      But I do want to say that the way you refer to "readability" sounds like you don't know what you're talking about and the "readable" code you would spew out would be anything but actually readable.

    • @rt1517
      @rt1517 9 днів тому +1

      @@Nors2Ka For me, readability is about how easy it is to understand the code. There is an infinite way of implementing a feature in a program, and some of them are better than others according to some criteria, for example readability or performances.
      Regarding readability, I would judge it by the amount of time an effort an average developer will need to understand a peace of code.
      At 22:29, on the left, I instantly know and understand what is a Monster and what it is capable of from its struct definition.
      But on the right, If I look at the Monster struct, I don't know that it can held items.
      And if I find the held_items hashmap, I am not sure of the key of the hashmap. Is it the Monster index in the Monster list? Is the hashmap dedicated to the Monsters or is is shared with the NPCs or other possible item holders? Should I clean something in the hashmap when a Monster is deleted?
      For me, the left version seems a lot more easier to maintain and add features on (for example, looting).

    • @bransonS
      @bransonS 4 дні тому +1

      I feel the same way.
      I know in my typical projects (large web and console apps), the code being easily read and maintained is much more valuable than saving KB of memory or a fraction of the time of a single web request.
      I guess these kinds of optimizations are for specific use cases, but none the less it’s interesting to hear about and have in the tool belt.

  • @xcoder1122
    @xcoder1122 Місяць тому +12

    15:50 Keep in mind that to access an object, you must now perform two memory accesses: First a lookup with the index into the pointer table to get the actual pointer, then another one for fetching the object from memory from the pointer address. So you saved memory but for the price of doubling the amount of memory accesses. And while the lookup table may be used frequently and thus be likely in some CPU cache, it also wastes a lot of space in that cache that would otherwise be used to the actual data you want to access.

    • @monad_tcp
      @monad_tcp Місяць тому +12

      The amount of memory access doesn't matter that much, the amount of data transferred is what matters, the CPU has a lot of parallel hardware to actually do the transfer when the instructions are being processed and the CPU itself is waiting for the data to come from the memory.
      This is old theoretical point of view in computing science, this idea that you amount of memory accessed or multiplications matter, or even instruction counting. This might have worked for a PDP11, but not for modern computers.
      Why are we still so fixed in trying to compute asymptotic in abstract sense when it bares no relation to actual hardware. Sometimes computing science is not a science, ironically, in the scientific method you start with observation, not with hypothesis.
      For ex, in all of the sorting theory, you can mostly ignore the amount of compares, that literally doesn't matter and has almost constant cost, assume N=1 because you're going to spend 90% of the time waiting for data, so only memory access matter, but not the count in operations, but the total in bytes, you can pretty much assume a memory access is also M=1.
      The only thing that really matters is total amount of bytes transferred, instructions don't matter much, except they have a size and need to be also transferred.
      I think textbooks need to be updated, they really do.

    • @lankin9217
      @lankin9217 Місяць тому +10

      He mentions the handle as an index into an array of objects, not an array of pointers. The pointers are only necessary if you heap allocate the objects. The point of trying to make your objects as small as reasonably possible is to make it easier to store arrays of objects and not arrays of pointers. That's also where locality benefits come from.

    • @diadetediotedio6918
      @diadetediotedio6918 Місяць тому

      @@monad_tcp
      "in the scientific method you start with observation, not with hypothesis."
      Because computer science is closer to mathematics (which is considered a science, and you don't "start with observations" [usually]) than to particle physics. The basis in which you are now spitting as having "no correlation to reality" are much larger than you and me.

    • @IndellableHatesHandles
      @IndellableHatesHandles Місяць тому +1

      This could also be the case with the approach at 19:41. The three arrays will almost certainly be far away from eachother in memory, meaning that you'll have a much higher chance of multiple cache misses to reconstruct one monster.

    • @attilatorok5767
      @attilatorok5767 25 днів тому +2

      @@diadetediotedio6918 I don't see how mathematics doesn't start with observation. I think a majority of math starts with finding odd connections and finding explanations for them or formalizing them. If you take set theory as an example (this is a good example, because it's the basis for most/all math), it started with the observation of "things can be categorized/grouped together". Category theory as another example (also a very fundamental theory), started with the observation of "a lot of mathematical systems have similar properties".

  • @TheOnlyFeanor
    @TheOnlyFeanor 17 днів тому +3

    The example at 20:00 doubles the number of memory accesses for each struct (one with the animation id, another one for the monster type), unless the reduction in footprint makes it all fit into a single cache line.

    • @jaredcramsie182
      @jaredcramsie182 16 днів тому +1

      I was going to comment this as well. Splitting the data like this will always require two cache-reads on the initial load *. If the out-of-band technique is applied to more data, then each one will incur an additional cache-read. When looking at an individual access, it is like it comes with a guaranteed cache-miss for each time you apply the technique.
      Data for arrays are stored contiguously in memory. In this example, if you expect to access several elements in a row, then much of the data will already be in cache - because the size of each struct independently is less than one cache line in size.
      If you expect to go through all the data in any order (and it is small enough to fit in cache), then this results in less cache-misses total, because there are simply less cache-lines to access.
      These benefits can artificially inflate the performance savings of the technique when people benchmark it by iterating over an entire array of data, or test it in a limited context that doesn't factor in the accumulation of external data in cache from other processes, if the data / procedure is not used often.
      * For completeness, there are some special cases where you don't have two cache-reads:
      One cache read: I assume this is what you were referring to in your comment. If you happen to be storing less than 64 bytes of total monster data, 7 or fewer monsters in this example (assuming the 'animation' data is cache-aligned to start with and the 'kind' data is not). Then you can fit 3 extra monsters into the cache-line, where it could only store 4 monsters in a cache-line previously.
      Three or four cache-reads: In this example his data is a divisor of the cache-line length, if it wasn't then there may be a chance of having to perform another cache-read for each access.

    • @Norfirio
      @Norfirio 6 днів тому

      This technique is very useful for instances where you do multiple operations on the Monster and you only need one (or a few) fields at a time. For example, in a game you will have an update function and a render function. If you iterate with the padded struct, you waste time going over the anim field in the update function and then waste time going over the kind field in the render function. You just have to be careful of your use case, as with any optimization methods.

  • @cariyaputta
    @cariyaputta Місяць тому +20

    Learned something from this talk. So thanks.

    • @StarContract
      @StarContract 29 днів тому

      Learned something from this comment. So thanks (I guess).

    • @Hitler_Hunter_007
      @Hitler_Hunter_007 27 днів тому

      @@StarContract learn something to comments thanks gay or guy whatever.

  • @Fedor_Dokuchaev_Color
    @Fedor_Dokuchaev_Color 11 днів тому +2

    Bit fields? You don't have to use multiple arrays and enums which make the code less readable.
    struct Monster {
    unsigned int life_points: 63
    unsigned int is_alive: 1
    };
    That's it. You can pack a lot of data if you know what range of values is maximum/minimum in each case.

  • @nguyen_tim
    @nguyen_tim 29 днів тому

    Thank goodness you posted this on YT because now I don't have to go to Vimeo to watch this talk for the Nth time....

  • @rallokkcaz
    @rallokkcaz Місяць тому +3

    Thank you for uploading this again, this is great talk.

  • @Lircking
    @Lircking Місяць тому +3

    watched this on vimeo previously, great talk!

    • @pajeetsingh
      @pajeetsingh Місяць тому

      Rare. Last time I opened vimeo was in 2012.

    • @MatthisDayer
      @MatthisDayer Місяць тому

      It was only avaliable on vimeo for many months before

  • @murrayKorir
    @murrayKorir Місяць тому +32

    Just added 100 more fps to my renderer after watching this.

    • @FlanPoirot
      @FlanPoirot Місяць тому +19

      my toast toasted 1000x faster after watching this video
      data oriented sandwich 👍

    • @prezadent1
      @prezadent1 Місяць тому +5

      @@FlanPoirot You toast bread, not toast.

    • @FlanPoirot
      @FlanPoirot Місяць тому +6

      @@prezadent1 I apologize for my grave mistake, milord

    • @i-am-the-slime
      @i-am-the-slime Місяць тому +1

      @@prezadent1no. Americans just don't know what actual bread is

    • @prezadent1
      @prezadent1 Місяць тому +1

      @@i-am-the-slime You just made two logical fallacies in 1 sentence. Congratulations.

  • @notapplicable7292
    @notapplicable7292 29 днів тому +3

    Ngl i assmed for a moment this was a presentation to department of defense

  • @maximenadeau9453
    @maximenadeau9453 Місяць тому +2

    I think I'm on a new curve too after this video.

  • @Turalcar
    @Turalcar Місяць тому +4

    6:38 I just this week did an optimization that replaced a few cached values with recalculation.

  • @olhoTron
    @olhoTron Місяць тому +16

    Meanwhile, web developers send 30mb over the network just to show a simple text blog

  • @yash1152
    @yash1152 5 днів тому

    30:10 the plug i was looking for

  • @SumoCumLoudly
    @SumoCumLoudly 29 днів тому +2

    what an advantage in life being put onto coding at such a young age

  • @sporefergieboy10
    @sporefergieboy10 Місяць тому +43

    ZIGGERS RISE UP

  • @Oler-yx7xj
    @Oler-yx7xj 17 днів тому +2

    6:50 using light speed as a measure of how fast an operation executes is funny. It also makes you understand how fast modern computers are: having memory as fast as L1 cache 1 meter away from the CPU is literally impossible

  • @ThePOVKitchen
    @ThePOVKitchen Місяць тому +104

    ron this is not notes app, im begging you

    • @Plorpoise
      @Plorpoise Місяць тому +63

      I don't know, I think Ron has a point. Maybe it is a notes app
      1. Stored in the cloud and syncs across devices (for free!)
      2. You can make community posts (notes not tied to a topic)
      3. You can post comments on a video (notes on a specific topic)
      4. You can even include timestamps for specific citations to refer to
      5. Separate comments let you break up your thoughts across a video.
      I think Ron has convinced me that youtube is a notes app. And a pretty good one. You just happen to be able to see others notes as well (did someone say "collaborative"?)

    • @adamhenriksson6007
      @adamhenriksson6007 Місяць тому +1

      😂😂😂😂

    • @greyshopleskin2315
      @greyshopleskin2315 Місяць тому +6

      @@PlorpoiseYou have just convinced me. :o
      No, seriously, this makes perfect sense.

    • @Zaniahiononzenbei
      @Zaniahiononzenbei Місяць тому

      ​@@Plorpoisehave you seen my favorite social media network, Google Maps? It has posts, you can follow your friends and share your location. It has reviews, likes, and you can even use it for "content" discovery.

    • @ClayMurray8
      @ClayMurray8 Місяць тому +2

      @@Plorpoise If youtube has an api too get all your comments you have a _really_ great point.

  • @torgnyandersson403
    @torgnyandersson403 8 днів тому

    About the lexer and retokenizing. That's unnecessary work for strings and other tokens where the length isn't the same for all tokens of that type. Instead, just have one token for "begin string literal" and one for "end string literal". That's what I've done in my own projects. You get tiny token struct and no retokenizing.

  • @_mrgrak
    @_mrgrak 29 днів тому +3

    great talk, but at 20:00 andrew is talking about memory layout and how 7 bytes are wasted - assuming the enum is a byte! in many languages an enum defaults to an int, which would be 4 bytes. imo, the enum backing type should be shown, just for clarity.

    • @greasedweasel8087
      @greasedweasel8087 21 день тому

      But even if the tag is a u64 or such, those 7 bytes are still “wasted” in the sense that they don’t provide any information. They will always be 0, since the tag will only take values from 0 to 4 it never touches those more significant bits. In practice the exact type of the tag doesn’t matter because it will never take values high enough to get use out of those bytes.

    • @jotch_7627
      @jotch_7627 19 днів тому

      also what languages are you using that automatically and truly use 32 bits for an enum tag lol

    • @_mrgrak
      @_mrgrak 19 днів тому

      @@jotch_7627 well theres this isa called x86

    • @_mrgrak
      @_mrgrak 19 днів тому

      @@jotch_7627 the isa would determine word size, not the language. languages can target many isas.
      thanks for playing

    • @greasedweasel8087
      @greasedweasel8087 19 днів тому

      @@jotch_7627 I mean rust uses an isize by default which is ssize_t in C aka a pointer sized integer so it’s not that far out of the realm of possibility, although 1. You can easily change this and 2. That’s only for debug builds, when optimisations are turned on it immediately collapses to the smallest size possible

  • @tigranrostomyan9231
    @tigranrostomyan9231 Місяць тому +82

    bro realised that procedural is better than OOP at 14 years old age

    • @Krbydav328
      @Krbydav328 27 днів тому +6

      bro really actually realized frfr

    • @diadetediotedio6918
      @diadetediotedio6918 18 днів тому +7

      It is not necessarily better, only generally faster.

    • @yash1152
      @yash1152 5 днів тому

      shut up

    • @OdinRu1es
      @OdinRu1es 4 дні тому +1

      Each paradigm has its own use case. Use the right tool for the right job and mix them

  • @El-Burrito
    @El-Burrito Місяць тому +3

    This is so over my head but Data Oriented design seems super important for performance

  • @tordjarv3802
    @tordjarv3802 2 дні тому

    Damn, I should learn Zig

  • @monad_tcp
    @monad_tcp Місяць тому +1

    2:23 that's not the wrong order, its the perfect order. basic -> perl -> python -> c++

  • @Tomyb15
    @Tomyb15 26 днів тому +1

    Before even watching, I'm gonna predict that the solution is basically using an ECS or similar techniques to make objects of arrays of data instead of arrays of objects.

  • @qy9MC
    @qy9MC 24 дні тому +1

    So (Dod) not only saves memory but also makes programs faster !? I'm in !

  • @policyprogrammer
    @policyprogrammer 19 днів тому +2

    Avail yourself of bitfields. They're part of that standard and they can really save your bacon space-wise.
    In particular, I think they're better than encoding multiple Boolean values into an enum because of maintainability. Imagine you have these encoded enums, and you have to write a function like is_human, and it's working just fine but then someone adds a new variable and re-encodes all the enums so that you have to include some more, but you forgot to update the function.
    Another advantage of bitfields is you can take advantage of your knowledge of your actual use of the size of types. Like if you know you need a 24-bit, You can just allocate 24 bits, can you the next eight for something else.

  • @zxuiji
    @zxuiji 2 дні тому

    3:09 nice :)

  • @ENCRYPTaBIT
    @ENCRYPTaBIT Місяць тому +2

    I just learned about memoizing damn it 😅

    • @MaddieM4
      @MaddieM4 24 дні тому +1

      It's still useful in many situations! The rule of thumb is that the bigger the memoized object is, or more expensive it is to create, the more you'd benefit from keeping it cached somehow. Something accessed over the network, for example, you probably want to have some degree of local caching. Just be thoughtful that any time there's caching, you'll need to pay attention to expiration and invalidation.

  • @ktappdev
    @ktappdev 24 дні тому +2

    When he made alive_monsters and dead_monsters I was smiling so big my gf started investigating me

  • @whywhatwherein
    @whywhatwherein 28 днів тому

    ahah storing tokens without the end position... that's a new one for me! interesting

  • @sadhlife
    @sadhlife 10 днів тому

    16:03 zig has them now

  • @styleisaweapon
    @styleisaweapon Місяць тому +7

    22:00 that hash is going to cause a near-universal cache miss just to see if the monster holds anything

    • @Jan-h7q
      @Jan-h7q Місяць тому +6

      Not necessarily: if only a handful of monsters have inventory, the hashtable will be small and it will be in the L1 cache.

    • @styleisaweapon
      @styleisaweapon Місяць тому

      @@Jan-h7q thats not how that works

    • @lassipulkkinen273
      @lassipulkkinen273 Місяць тому +3

      To be fair, an out-of-order cpu should be able to do the next hash calculation and subsequently queue the next read from the hash table while waiting for one read to complete, assuming no branch misses, anyway. So I wouldn't expect the cache misses to be too frequent, even though I find the supposed performance benefit of the hash table here rather dubious.

    • @styleisaweapon
      @styleisaweapon Місяць тому

      ​@@lassipulkkinen273 you'd be right if the difference between missing and hitting was only one order of magnitude or so .. but its not .. missing doesnt get hidden by other latency on anything made in the past 10 years .. instead the latency of everything else is hidden by the miss

  • @xmorse
    @xmorse Місяць тому +33

    @ron5948 is going crazy with this video

    • @100timezcooler
      @100timezcooler Місяць тому +4

      slurring his words n shit

    • @robrick9361
      @robrick9361 Місяць тому

      @@100timezcooler Is he using a voice to text software? Is that why his comments have random misspelled words

  • @yash1152
    @yash1152 5 днів тому

    20:27 the speaker image covered the content.
    and i dont understand this example too either.

    • @yash1152
      @yash1152 5 днів тому

      20:52 ohhhw, its about language feature where the lang allows to create array of each member ef struct.

  • @yairlevi8469
    @yairlevi8469 Місяць тому +4

    21:06 doesn't this make more memory accesses a requirement? if you want the full struct of data, say of element 0, you'll need to access not a continuous sequence of bytes in memory, however padded, but 2 (kind and monster pointer) from different places in memory, since theyre stored not side by side.

    • @NeunEinser
      @NeunEinser Місяць тому

      Wouldn't this be true regardless? If you lookup animation, you'd need to do + 0 and + 8 for kind, and in the changed version you do and .
      Now, for struct base address in the first example, this is a relative lookup based on the struct array pointer, and in the second example the lookup to both animation and kind are also relative offsets based on their respective base pointers.
      The struct of arrays should also just be part of the stack, just like the array pointer in the original.
      So I don't see any difference here?

    • @yairlevi8469
      @yairlevi8469 Місяць тому

      @@NeunEinser in the first example the struct is contiguous in memory and in the second example its spread out, since the two arrays are most likely spread out in memory, so you are required to access twice in different places rather than a single point and from there read the entire contiguous block. Unless I'm missing something

    • @thewhitefalcon8539
      @thewhitefalcon8539 25 днів тому

      but normally you don't want the whole struct

    • @NeunEinser
      @NeunEinser 19 днів тому

      @@yairlevi8469 Yeah, that is true. If you are iterating over all structs, and you have enough cache lines for all fields, this is probably better though. You will have more of the data in the cache at the same time than if all was a single array.
      If you access more fields than you have cache lines, this would perform a lot worse though.
      That's why you shouldn't blindly adapt any kind of concept anywhere, and always run benchmarks to see if your changes actually help.

  • @pointer333
    @pointer333 Місяць тому +1

    This is gold!

  • @ron5948
    @ron5948 Місяць тому +15

    You kinda want to go all the way and not store heap allocated things at all. Put everything in a hugepage sized chunk of memory 2MB+ makes swapping and ssd flsuhing easier and muuch faster (bypasses most if the ssd firmeare) and then use very small array indexing cache lines not single data elements. This eay u can use a circular buffer of vm mapped hugepages where mmu allocates physical memory for you. Ye then do arena based eviction and copying to clear these pages and reuse them for new data. Containers then become strategies to align data contained into pages and cachelines, direct memory pointers are not data bc cannot be serialized anyway so ideally you avoid them completely. Cpus can use the cache line u ron on as a base pointer and then do relative lookups memload

    • @ron5948
      @ron5948 Місяць тому +2

      The boolran thing is maor about set membership the aliveness of an object is depending on your dokain model not relevant peoperty of the object itself and u want instead have a set for alive monsters or dead one full of integer offsets. Integer sets are hugely compressible, so this tricks are highly dependent on the data usage, if you know the queries beforehand you can make a query optimizer decide the structs in q , in other words its too low level for the programmer to choose dsta chunking by having static (read type ssystem level) structs, instead these emergr and change depending on the querirs and functions you ron on dats haiajaajgaikjsiik😂❤

    • @ron5948
      @ron5948 Місяць тому

      Categories are particularly painful with static set of attributes named like in a struct. Human may have very different attributes form animal but share libe ess doing inheritence or even strucuteal sharuing unuseful and slower pergormance anyway, refactorings immense painful anyway fsr

    • @Maurdekye
      @Maurdekye Місяць тому

      what

    • @JimTheScientist
      @JimTheScientist Місяць тому +2

      fortunately for me i’ve been writing my kernel and didn’t want to write a kmalloc so I pretend silly high level concepts like the heap don’t even exist. also why even bother aligning anything if you can just attribute packed it all (i joke)

    • @ron5948
      @ron5948 29 днів тому

      @@JimTheScientist itd not high level you [forced self censorship], thsts the problem try to be less abrassive ron
      The underlying principle I am trying to convey here is thst thr primitives we use map to 1980s hardeare in a bad e way bc primitive algos and heuristics, DO NOT MAKE SENSE ON MODERN HARDWARE, so imprdancr mismatch horrivle performacne, complexity explosioon
      99‰ morons with h1b visas genrrsting design patterns gsnf od for abusing shitr java code nobody thinking csn tolerate, ye so that kind, you misunderstood, i provoked misunderstanding, but correct response would be to ask for clarification

  • @sachahjkl
    @sachahjkl Місяць тому +1

    Exceptionally good talk, congrats

  • @aetherclouds1181
    @aetherclouds1181 Місяць тому +3

    28:30 I thought we were padding structs to have the size be a multiple of the natural alignment? if the tag enum is 1 byte, wouldn't a bee or a human be 4 + 4 + 4 + 1 (enum) + 3 (pad) = 16 bytes (excluding the HumanClothed data)?

    • @srnissen_
      @srnissen_ Місяць тому +1

      huh it appears to have eaten my reply.
      shorter answer then: no, because there's no padding, because the struct is split up into parts that each get their own array - one is 12 bytes (no padding needed) and the other is 1 byte (no padding needed) for 13 bytes per bee. If you copy a bee out of the multiarraylist, it gets reassembled into a single struct with padding, size 16.

    • @El-Burrito
      @El-Burrito Місяць тому

      Yeah that part lost me too

    • @taimenwong9765
      @taimenwong9765 Місяць тому

      rewatch 19:05 and you'll get the answer

    • @aetherclouds1181
      @aetherclouds1181 Місяць тому

      @@srnissen_ you're right, I just noticed he uses a MultiArrayList, and that is a SoA, I was just a bit confused by the zig syntax (like how you can define struct members that aren't really in memory like Monster.Common or Monster.HumanClothed). thanks!

  • @JimTheScientist
    @JimTheScientist Місяць тому +1

    why think when i can just write attribute packed on everything

  • @lunarmagpie4305
    @lunarmagpie4305 Місяць тому +1

    This talk was awesome!

  • @avwie132
    @avwie132 29 днів тому

    His backstory is 1:1 my backstory… I feel very unique

  • @afterschool2594
    @afterschool2594 Місяць тому

    I am not a really low level programmer. But I am beginning to take interest recently. One of my biggest question is what is alignment and padding? Is it the same thing? And why do we need them?

    • @maximknyazkin662
      @maximknyazkin662 Місяць тому +3

      I am not a specialist, but as I understand it. Alignment is related to the fact that it is easier for the CPU to read and operate on data if it is positioned on the multiples of 4 bytes and each piece of our data is spaced equally from each other. So this is alignment. Padding means specially added spacing between pieces of data if they do not align properly by themselves. Like for boolean. If it is 1 byte and we need it to go with something that is 4 bytes, we need to add 3 bytes of padding to make them equally sized. This way the CPU can read it faster as it does not need to take varying sizes into the account

    • @mss664
      @mss664 Місяць тому

      @@maximknyazkin662 Not a multiple of 4 bytes. It's faster to read when the value is naturally aligned. So a byte-sized value can be stored at any address, two bytes (u16, short) should be placed only at even addresses, 4 bytes (u32, int) at addresses divisible by four, etc. Cache lines are aligned too, usually to 64 bytes, and if the values were not aligned like this, it's likely that the value lies on the cache line boundary, and the CPU would have to fetch two cache lines to load one value. I haven't tested it on smaller types, but on my machine loading memory aligned to 16 bytes into SSE registers was about twice as fast as when it was unaligned.
      Also reading data from unaligned pointers isn't even valid on all architectures. In C and C++ dereferencing an unaligned pointer is illegal. Some CPU architectures might do the expected thing, some might throw an exception, some might just ignore the unaligned bits and read memory to the left of the actual pointer value.

  • @salim444
    @salim444 Місяць тому +7

    wasn't this uploaded on vimeo?

  • @minastaros
    @minastaros 13 днів тому +1

    Hello, bitfields?

  • @pajeetsingh
    @pajeetsingh Місяць тому +1

    Any burnout here just checking the video?

  • @salim444
    @salim444 Місяць тому +2

    can someone explain what he meant at 36:32 about storing one token index? I am kinda lost at this point 😅 how this comes into practice

    •  Місяць тому +1

      He was storing the start and end indices of the token, ie: where the token starts and ends in the file. You only need to store the start index, because it's trivial to calculate the end index (end_index = start_index + token_length)

    • @salim444
      @salim444 Місяць тому

      I understood that, I actually meant the second bullet point where he is talking about multiple tokens and to find the index of other token we only need to "poke around" the tokens array

    • @curtis6919
      @curtis6919 Місяць тому

      @@salim444 He's just saying that instead of storing the line and column you store the token index, which is an index into the outputted array from the tokeniser (the previous stage in the compiler). The token immediately after any token n will always be n+1, the one before n-1, and so on--and so you can simply recalculate the token's context without necessarily storing them in the AstNode struct.

    • @MrSentientSimian
      @MrSentientSimian Місяць тому

      What he means is that if your code is `if(a==1)`, then all you need is to store the location of `if` because you can assume the next token is `(`. So you don't need to store the token index for `(`, you already know it's the next one in the token array after `if`.
      Hope that helps

    • @salim444
      @salim444 Місяць тому

      @@MrSentientSimian thanks!

  • @drsensor
    @drsensor Місяць тому +3

    you should put a link where original video located so the traffic is also directed to their website. just mentioning it's copyright by Handmade Seattle is not enough

  • @100timezcooler
    @100timezcooler Місяць тому +1

    Sick nice cool. (dont ask me to write a poem)

  • @YmanYoutube
    @YmanYoutube 14 днів тому

    Why the fuck is a talk about an abstract concept in CS have 67k views in 3 weeks, this field is GIGA oversaturated.

  • @titaniumtomato7247
    @titaniumtomato7247 Місяць тому +2

    Me I'm a mere mortal and very unexperienced. I'm at 20:09 now and I'm wondering something.
    Let's say you do the thing where you split your monsters into the two arraylists alive/dead. What if you add a third state like deactivated? Later you might add more.
    At that point you could add a third or fourth list. But maybe they are very sparse some of them. Is that a problem? And what if the states get more complicated where your game logic needs to check if some conditions are true but not others. In the end you make a bitfield and the whole state becomes an int. That might happen right. It feels like separating into separate lists might be a premature optimization that you regret later. Is that true?
    (EDIT: I guess it was more of a stepping stone to the MultiArrayList idea whoopsie, more like premature raising of concerns hehe)
    Also how can you efficiently move monsters between the living and dead lists. I remember when iterating through a list in java you could not remove elements. Is that just a problem that turns out to be worth it since you read/write to a list element many more times than you are likely to remove it?

    • @soniablanche5672
      @soniablanche5672 Місяць тому +2

      In a real game, don't bother with this optimization. saving memory in the kilobytes is meaningless in current year.

    • @Mark-wz9uh
      @Mark-wz9uh Місяць тому +8

      @@soniablanche5672 It's not about space/memory savings, but performance.
      I would agree usually don't bother, but in a hot loop this is one way to make it fit better in cache (which depending on the problem and size, can give you multiple time speedups).

    • @curtis6919
      @curtis6919 Місяць тому +5

      @@soniablanche5672 Did you even watch the video?

    • @srnissen_
      @srnissen_ Місяць тому +5

      I'm not quite sure how Java works but if it's like other languages I'm familiar with, you cannot move elements during foreach due to underlying implementation issues with that idea, but you can absolutely move them if you're not doing foreach.
      If I was doing this in C++, I might simply have a single array with all monsters, dead or alive, and an index for "first dead monster" - every time a monster dies, I swap it with the last living monster in the array and decrement the index of "first dead" (for idiomatic reasons, I'd rather store "first dead" than "last alive" but that's a detail)
      Capital letters are alive, lowercase are dead.
      ABCDE 5 - start. All are alive and index of "first dead" is actually past the end of the array
      AECDb 4 - b died
      ACDeb 3 - 'e' died
      Now, when I want to iterate over my living monsters, I loop for (i=0;i

    • @jhacklack
      @jhacklack 19 днів тому

      These are great questions to ask.
      "What if you add a third state like deactivated? Later you might add more."
      You profile the distribution of monsters in each state and and adjust your arrays if needed, the more common status type gets a smaller memory footprint by being the default branch. Less common types are accessed through more logic but that's not a problem because logic is fast and by definition it wont be used much in comparison.
      "And what if the states get more complicated where your game logic needs to check if some conditions are true but not others. [...] Is that true?"
      Pretty much, the exact optimization depends on taking measurements of your actual code. The underlying philosophy of DoD is that you have your data and can use all the assumptions about your data that you know to be true, instead of pretending that all the data is arbitrary and unkowable and defensively programming around that. You just look at the data and the profiling and match the code to it.
      "Also how can you efficiently move monsters between the living and dead lists. "
      Depends how much time the CPU is waiting between frames, generally the CPU finishes it's job fast and most of the frame is spent waiting for the GPU to draw everything (obviously depending on the graphical fiedlity of the game) so you should always have time to set a flag that a certain monster needs to change state. Now whether you immediately change it's state or whether you wait until more monsters need a change state so you can do it all in one batch, depends on your profiling from before. If monsters don't die often then it might be better to have the CPU clean up the array at the end of frame by writing the monster's data to a buffer, nulling it in the array, and then copying from the buffer to the dead array. Or it might be better to batch move lots of monsters at once. It really depends on the type of game which the developer will obviously know about.

  • @msmeraglia
    @msmeraglia Місяць тому

    this is my exact journey haha

  • @bobweiram6321
    @bobweiram6321 Місяць тому

    Ada solved these issues back in the 80s. Types sizes are specified in bits instead of bytes.

  • @amjadaltaleb7559
    @amjadaltaleb7559 Місяць тому

    There is a despicable amount of ads in this video

    • @ChimiChanga1337
      @ChimiChanga1337  Місяць тому +6

      Channel is not monetized therefore the video is also not monetized. I think youtube just runs ads on all videos irrespective of their monetization status.

  • @Kenbomp
    @Kenbomp 22 дні тому +1

    Nice but lisp coders knew this for decades McCarthy

  • @Ic3q4
    @Ic3q4 Місяць тому +1

    Noice

  • @diadetediotedio6918
    @diadetediotedio6918 Місяць тому +2

    Yet, this does not appear to me to be the most problematic performance problem of modern software. The most problematic thing in modern software appears to be the extensive reliance in IO (which is orders of magnitude slower than RAM access) and what I call "browser-bloating" (which is, using browsers and dynamic languages with excessively slow performance to make bigger ones). But DoD is certainly interesting on itself, good talk.

    • @BoletusDeluxe
      @BoletusDeluxe 19 днів тому

      DOD is used extensively in game development and is still being expanded on

    • @diadetediotedio6918
      @diadetediotedio6918 18 днів тому

      ​@@BoletusDeluxe
      I did not denied that, this is not part of my point. I'm talking about "modern software" and most of the existing software is some web application, server or GUI or something inbetween. Games are absolutely an important part of it, but this here is arguing for DOD for general software development, and my questioning is about wheter or not OOP and other not so efficient programming paradigms are really to blame or if these people are making an unfair comparison where it matters most for them. My point is that, in the verge, no matter if you are using the most performant language or paradigm, the significance of your saved miliseconds will be already munched by that database call, that file read/write, that network request, etc (not saying there could be no improvements, but that you will only get heavy diminishing returns of it).
      Games also are a very special and specific category of software, they do run every single frame a bunch of software logic, UI software, web software and even servers "profit" from being lazy and reactive, they only process data when some request is made.

    • @BoletusDeluxe
      @BoletusDeluxe 18 днів тому

      @@diadetediotedio6918 true, i suppose the issue of modern software extends far further than just architecture, but there is a lot that can happen culturally from embracing efficiency and performance.

  • @styleisaweapon
    @styleisaweapon Місяць тому +3

    No modern cpu has any single bit registers. bool is a wholly imaginary datatype, and as such its up to the compiler to decide how to store it, how to work with it, etc. Contrast that with things like int and long, which are defined (or intentionally left undefined) so that they behave exactly the same as the registers of all modern processors... and no, the flags register isnt a single bit, although a few instructions behave differently depending on the state of single bits within it. The languages that define true to !false leading to true = -1 makes a lot of sense given this.

  • @LNTutorialsNL
    @LNTutorialsNL 25 днів тому

    Can someone link the video he mentioned by the guy in the red shirt?

    • @ChimiChanga1337
      @ChimiChanga1337  25 днів тому +1

      ua-cam.com/video/rX0ItVEVjHc/v-deo.html - Mike Acton DoD talk

  • @michaelotu9723
    @michaelotu9723 Місяць тому

    why is the size 24 and later 16 when the struct variables were rearranged?

    • @johnnm3207
      @johnnm3207 Місяць тому +1

      because of how computers align things in memory to prevent gaps in memory. The size decrease is a result of removing the extra padding needed to keep the struct aligned

    • @Clowndoe
      @Clowndoe Місяць тому +1

      A struct needs to begin and end on a multiple of its largest member. The first u32 takes 4 byes. The u64 starts 4 bytes after that, because it needs to start and end on a multiple of 8, leaving a 4 byte space. Then the u32 takes a further 4 bytes, and the compiler gives the whole struct another 4 empties to make an even 8 useless bytes.
      The second version takes the last u32 and puts it in that space, making it all nice and neat.

  • @i-am-the-slime
    @i-am-the-slime Місяць тому

    Can somebody tell me what memory alignment is?

    • @maximknyazkin662
      @maximknyazkin662 Місяць тому

      In short as I understand it. It is faster for CPU to read and operate on data if it is equally spaced from each other (alighned) because it doesn't need to take the varying sizes into account. So if you have something which is 4 bytes in size and the other is 8, you need to add 4 bytes of extra spacing (padding) to the 4-byte thing, so that the distance between the beginnings is equal. I guess then CPU just knows that it has to read 2 pieces of 8 bytes and goes on reading instead of calculating that first it is 4 than 8

    • @JimTheScientist
      @JimTheScientist Місяць тому

      your registers are fixed sizes (8 bits, 16 bits, 32 bits, 64 bits, and more with xmm avx), and you want to do one operation to move all the bits from memory into your register.
      Think about how you would connect memory and a physical register, you probably wouldn’t want to move 4 bytes of data one by one by incrementing your addresses (which selects the cells in ram containing your data), you could just wire it so that selecting one address hardwires all the bytes starting at the selected memory address to your register. If you align with binary, like say you want 4 addresses, then starting at address 0100 will let you get 4 addresses 01XX. If you started at 0101 then you have to increment to 1000.

  • @dj10schannel
    @dj10schannel Місяць тому

    🤔

  • @ron5948
    @ron5948 Місяць тому +1

    Tolookinto multiArraylist in zig ron, looks rly cool

  • @plaidpunk
    @plaidpunk Місяць тому

    Shit this is too high level for me right now but I can tell it's going to be useful so Im just gonna save it.
    Really really wish I could just find a tutorial on programming C/Assembly instead of like how to turn on a fucking IDE.

    • @coreygossman6243
      @coreygossman6243 Місяць тому

      You'll either have to learn command line or an IDE, and if you learn one you'll have to learn the other lmao

  • @sporefergieboy10
    @sporefergieboy10 Місяць тому +1

    43:20 language!

  • @ron5948
    @ron5948 Місяць тому +1

    If you have lotsa data like the humans example you want to use a EATV approach then get the atts in a set as abvoided ron ahjaiaha

  • @FerPerez-mc3wr
    @FerPerez-mc3wr Місяць тому +1

    I'm asking for your prayers and support. Please send healing vibes my way.

  • @AbnerCoimbre
    @AbnerCoimbre Місяць тому +37

    Hello I'm the organizer behind this conference. You cannot re-upload our presentations without properly crediting Handmade Seattle somewhere, and they must be demonetized. Please reach out to me with evidence that you've taken these steps, otherwise we'll have to take action against your channel.

    • @ChimiChanga1337
      @ChimiChanga1337  Місяць тому +27

      I have added copyright info pointing to Handmade Seattle in the description. Regarding monetizaton, the channel itself is not monetized.

    • @alquemir
      @alquemir Місяць тому +35

      I have reported you for impersonating the organizer behind Handmade Seattle conference.

    • @user-nl2ps6mh6l
      @user-nl2ps6mh6l Місяць тому +22

      lmao shut up mate

    • @grim5i
      @grim5i Місяць тому +31

      I would never have found this talk without this video. You really need to put all the media on youtube.

    • @TheRealZitroX
      @TheRealZitroX Місяць тому

      Yeah. Shit up

  • @nangld
    @nangld Місяць тому +1

    So Zig is about writing "var x: int," instead of "int x;"?

    • @crimfi
      @crimfi Місяць тому +2

      Rust uses such notation as well. In a lot of cases type can be derived by compiler: *let x* _:int_ = 4. Here ":int" would be typed by lsp in ide. In my opinion this is an improvement over C/C++'s "auto".

    • @FlanPoirot
      @FlanPoirot Місяць тому +5

      ​@@crimfithe reason this is done is because it makes writing parsers for the language much easier
      if you have `int x;` and `int x() {}` you can't know whether x is a function or a variable declaration until you run into the `()` whereas if you have `let x;` and `fn x() {}` you would know which is which right on the first token.
      this means languages with these qualifier/postfix syntaxes are much more uniform and easier to parse

    • @thewhitefalcon8539
      @thewhitefalcon8539 25 днів тому +1

      It's objectively way easier to parse. Just a pain that it's different from C.

  • @tbird81
    @tbird81 24 дні тому

    So awkward seeing a nerd swear when doing a speech.

  • @theinsane102
    @theinsane102 29 днів тому +1

    In twenty years there will be videos called "the problems with Data oriented Design", "reasons to avoid Data oriented design". The programming world moves in hype cycles around new methods. One time OOP was all the rave.

    • @Galakyllz
      @Galakyllz 26 днів тому

      Yeah, any time there's a trade-off, you'll find a constant tug-of-war over the years. The strategies presented in the talk are great if you're ready to optimize, but it's now harder to maintain the code. As people migrate their code to optimize the bytes over the next decade there will be a resurgence of people arguing that maintainability is paramount.

    • @thewhitefalcon8539
      @thewhitefalcon8539 25 днів тому

      And each time we hopefully get better

    • @BoletusDeluxe
      @BoletusDeluxe 19 днів тому

      ​@@Galakyllzdata oriented design is not harder to maintain unless you only know how to maintain OOP code.

    • @Galakyllz
      @Galakyllz 18 днів тому

      @@BoletusDeluxe That's what I mean: you will find coworkers that are not as capable as you, so changing the data structures in ways that reduce the memory required will just result in your coworkers making mistakes when adding new fields, then they become frustrated, then they avoid tickets involving that part of the project altogether. Now you are the only person maintaining that part of the project. Eventually your coworkers will say "Well BoletusDeluxe's changes a few months ago make that process unmaintainable. It's more efficient but it's harder to maintain, so that's why we haven't gotten around to adding all of these other features."
      You might think "well, the less-experienced coworkers will just get fired/replaced then", but that is painfully unlikely from my experience. You become the anomaly, the senior developer that no one is as capable as, so they don't try to hold everyone to your standard. If they wanted that, they'd only ever higher senior engineers. So to allow the entire team to work on that project, they revert your changes.

  • @YellowCable
    @YellowCable 29 днів тому

    there are a few good intuitions, but there are a lot of mistakes in this talk, things that are left undefined ("natural alignment" of a struct? what is the definition of that?), very arch dependent, confusion between language and CPU...

  • @ron5948
    @ron5948 Місяць тому +3

    Why not have this buoiltin pssht dont trll Them inn the crypto compiler ye ron and thr Zig nonthst would br crazy shtupdinton ahakiskei❤😂🎉 42:31

    • @LtdJorge
      @LtdJorge Місяць тому +17

      Ron, do you type by banging your head into the keyboard, or are you just drunk? 😂

    • @ron5948
      @ron5948 29 днів тому

      @@LtdJorge my spellchecking algo is not implemented yet it is based on insights in dna rna pattern matching and searching I girst have to write the databse allowing me to implemrt that algo, then my own social messaging layer app kidna, there logs i call these comments bc mainly for me, will make more sense but still written in a very Ron idomatic way ahhaha

  • @ron5948
    @ron5948 Місяць тому +1

    36:57 ToLookinto how to feed it generated AST wihtought file roundtrip not necessary ron bc youbin amonth will have the besthest filesyrtrm psdbrl veruy fast and genrrsting files based on ast for example itself ye hmm mb no ron hsjs yeyeyyeyeeyey

  • @ron5948
    @ron5948 Місяць тому +3

    MLIR as Zig compilrntime target tolookinto RON?! Mb They alsonibterested toskin review the issue terackker rob