Andrew Kelley Practical Data Oriented Design (DoD)

ChimiChanga

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 лис 2024

КОМЕНТАРІ • 354

@EvilTim1911 Місяць тому ⁺⁷⁷
Man, this guy is pretty good at Zig, he should join the team developing the language
@mika2666 4 місяці тому ⁺¹⁸⁵
Just a note for anyone who looked at the CPU cache slide: yes the L1 and L2 numbers are 8 times the size they actually are because lscpu adds all your caches from all your cores together.
@ron5948 4 місяці тому
@@mika2666 api compatibility on cli level was linux unixes worst mistske
@gfasterOS 4 місяці тому ⁺⁹
There's also an -c flag that gives you cache information specifically (including per core)
@thewhitefalcon8539 4 місяці тому ⁺²
Only 8 cores in 2024?
@johndowson1852 3 місяці тому ⁺⁸
@@thewhitefalcon8539there are basically no laptop x86 chips with more than 8 cores.
@thewhitefalcon8539 3 місяці тому ⁺¹
@@johndowson1852 for a laptop yeah. My tower has 24.
@anotherelvis 4 місяці тому ⁺¹⁵²
The original talk is from 2021.. Thank you for reuploading.
@daxramdac7194 4 місяці тому ⁺¹²¹
I love how that talk by Mike Acton was actually a famous talk regarding data oriented programming, so much so that Andrew threw on the red hawaiian shirt lol. That talk by Mike was also the first time I sat and heard anything about this data oriented programming stuff. One thing though that needs to be pointed out is how GOOD Andrew is at this communicating technical subject matter. This presentation was about as clear an intro to this subject as you can get, 👍👍
@godDIEmanLIVE 25 днів тому
Both Mike and Andrew are absolute GOATs.
@TJChallstrom916-512 4 місяці тому ⁺⁷⁵
This talk is why an applied computer architecture course is so useful at uni. Like at the time you think "wtf I'll never look at nasm again" and most are correct, but the lessons it teaches you make you better 10 years down the line.
@d1cn 4 місяці тому
Can you recommend which one to choose?
@mr3745 4 місяці тому ⁺¹²
A broad CS/CE education doesn't teach you to code, it teaches you to think. Basic computer architecture + complexity theory completely changes the way you look at problems forever.
@yapayzeka 3 місяці тому ⁺²⁵
6:16 very important gotcha! "math is faster than every memory operation. maybe you should repeat the math instead of meoizing the result."
@Ariccio123 2 місяці тому ⁺³
I'm not a zig programmer but I was very familiar with llvm for a while.... This is gold!!
@xcoder1122 4 місяці тому ⁺⁴⁵
In C you often store a bool by re-using one bit of an u64 value where not all 64 bits are ever needed. E.g. you may have a u64 size value, because sizes can be more than 2^32, yet you know in advance, that sizes will never get anywhere close to even 2^63, so you can easily use the highest bit of the u64 to store something else, like a boolean value. Very often you can even use the top 4, 8 or even 16 bits of a 64 bit value to store other information.
@explosionspin3422 4 місяці тому ⁺⁴
This is the way!
@cyberpunkalphamale 3 місяці тому ⁺⁵
old school
@That_Awesome_Guy1 3 місяці тому ⁺⁵
That's called bit packing. It's used all the time in graphics programming as memory bandwidth is a big factor in gpu performance.
@Ashalmawia 8 днів тому
I thought he was going to do something like that too, but he sort of made the flag semantically unnecessary instead, by using separate collections
@nguyen_tim 4 місяці тому ⁺⁷
Thank goodness you posted this on YT because now I don't have to go to Vimeo to watch this talk for the Nth time....
@dog4ik 4 місяці тому ⁺¹³⁹
Good talk, I will use all the strategies in my js project.
@ErikBongers 4 місяці тому ⁺²⁷
Agreed . Implementing it in COBOL now.
@IndellableHatesHandles 4 місяці тому ⁺²⁵
Definitely going to be very useful for my Python project. Andrew is a legend
@Michallote 3 місяці тому ⁺³
I don't think you are going to get too many benefits in python. Because everything already is a dictionary after all. The memory model in python exactly abstracts away this part of the code. Even if you try to do it like Andrew, and add type hints. Or whatever, you wont get any benefits because the python interpreter has dynamic allocation of memory. That thing you said was a bool can be concerted any second to an array and python wont mind or crash. So the memory juggling will still happen...
@SimonBuchanNz 3 місяці тому ⁺⁷
You're joking, but yes, you can apply this in JS. High performance JavaScript uses a lot of typed arrays to control memory locality, and avoids allocation.
@theairaccumulator7144 3 місяці тому ⁺²
@@SimonBuchanNz high performance JavaScript? WASM and native binaries in Node are way faster than anything you can do in pure JavaScript.
@JonitoFischer 4 місяці тому ⁺³⁵
I loved he mentioned: ”C, Zig, Nim"... Those languages are awesome
@bobweiram6321 4 місяці тому ⁺⁶
Ada is better! Type sizes are in bits.
@kellymoses8566 4 місяці тому ⁺⁵
@@bobweiram6321 Ada would be more popular if the compilers hadn't been so expensive.
@samuraijosh1595 4 місяці тому
@@kellymoses8566 what do you mean?
@JohnDoe-np7do 3 місяці тому ⁺¹
Nim sucks tho. But the first 2 are goatz
@JannisAdmek 2 місяці тому ⁺²
Don't forget he mentioned Odin :D
@onsearchfocus 2 місяці тому ⁺²
Great talk. No fluff. Just gold!
@DMWatchesYoutube 4 місяці тому ⁺⁴⁶
I had the one hour a day rule too, and that's why I'm nocturnal, because my grandparents went to sleep at 9pm
@dantesp7557 4 місяці тому ⁺⁸
Absolute GOLD of the talk, thanks for publishing it
@josephlabs 4 місяці тому ⁺⁴⁰
Moral of the story is to segregate data in memory. Makes it easier to read and hit cache and uniform data wastes less bytes for padding.
@sacredgeometry 4 місяці тому ⁺⁶
The moral of the story is its easy to optimise toy examples.
@kulkalkul 4 місяці тому ⁺²¹
@@sacredgeometry lol, you are the most dunning kruger guy I've ever seen. I recall you from another reply, grow up buddy
@sacredgeometry 4 місяці тому ⁺¹
@@kulkalkul I have no idea who you are but you have given me absolutely no reason to care what you think or to believe your judgment is any good so ... ok.
@user-qw9yf6zs9t 4 місяці тому ⁺¹¹
@@sacredgeometryid understand the tone of your reply if the examples werent obviously to teach the concepts? do you expect a full blown project shown here??? sorry if im being rude but i genuinely dont understand
@MaddieM4 3 місяці тому ⁺²⁷
@@sacredgeometry if the actual Zig compiler is considered a toy example, I'm a little terrified of the upcoming generation of programming interview questions.
@EmberMage8192 Місяць тому ⁺³
One thing I noticed about booleans is that in some cases people duplicate an implicit information with them. Which means such booleans can be replaced with a function.
For example, with the monster struct, you can actually detect if a monster is alive by simply checking if HP > 0 (unless your game logic allows a monster be alive otherwise, of course).
Sometimes booleans duplicate implicit state of a nullable field. For instance, I've seen a user account record with email and a flag called activated. But user account was only considered activated if it has an email, so you can just delete the boolean and use email != null check.
@masterchief1520 18 днів тому
My previous company has that lmao. Mobile verification and email. It was fucking integer field 🥸
@soviut303 4 місяці тому ⁺¹³
What's interesting is the out-of-band booleans is something I use pretty frequently in databases when building views based on a table with a boolean. One view contains all that are true, the other contains all that are false; pre-discarding. I guess that's (maybe?) why it's called data oriented design.
@xcoder1122 4 місяці тому ⁺¹¹⁸
When storing out-of-band data in a hashtable, keep in mind that hash access is WAY slower than pointer access.
First, hashtables are only O(1) in an ideal world, in reality no hashtable implementation is always O(1).
Second, O(1) does not say anything about speed. It only says that the speed does not depend on the number of elements, but it can still be terribly slow. O(log2n) can be much faster than O(1), it just won't stay faster if you keep adding more and more items, because O(log2n) keeps getting slower while O(1) doesn't, so O(1) will take over at some point, but what if the amount of data you have in the worst case is way below that point? Because that's often the case. For small amounts of data, sorted arrays often give much better results than hashtables.
Finally, a hashtable requires at least one memory access per lookup, so whatever cost you have when accessing a pointer, you have the same cost when doing a lookup in a hashtable, just to end up with a pointer, so you have to do another lookup, or to end up with an index, so you have to do two more lookups. And then there's the cost of hashing the key, which you don't have with direct pointers or sorted lists.
@NotherPleb 4 місяці тому ⁺²⁸
The point is storing out-of-band, not the data structure to store. Nevertheless, hash function is just math and math it's fast, you're overstating the performance costs and assuming the dataset is small which may not be true. As usual, humans are very bad at predicting performance, benchmark first
@xcoder1122 4 місяці тому ⁺⁵⁰
@@NotherPleb Actually I'm speaking from experience. I once needed a lookup table in a kernel module and I chose a hashtable, because O(1) would ensure that I get pretty consistent lookup times, which is quite important in kernel code. Years later, just out of curiosity, I wanted to know how much slower the code would have been using a sorted list and a binary search instead. So I wrote a couple of unit tests, that would fill either a hashatble (same implementation as the one used in the kernel) or a simple sorted list (own implementation) with real world data (data was randomly chosen but from real world data sets), perform plenty of lookups and time the results. The conclusion was that the list was faster every time.
Knowing that the list is O(log2(n)), I calculated approx. at how many elements the hashtable would win and I came up with a value a bit above 500 entries. And indeed, when I increased the sample size above 500, the hashtable became equally fast and at around 550 entries, the hashtable would finally consistently win against the list (only if it stayed collision free but that's another issue). There's just one catch: Due to technical limitations, it was impossible for the dataset to ever grow beyond 256 entries. So there was no real world scenario where the hashtable would have ever beaten the sorted list. On top of that, to stay (almost) collision free under all real world scenarios and never experience a degredation in performance, the hashtable had to be at least 50% larger than the data kept within, so not only was it slower, it also would waste way more memory and scatter the data way further apart in memory.
You incorrectly assume hashing is just math but that is wrong, of course you also must read the data you want to hash from memory and that data may not be fixed size and it may not be well aligned. And a good hash algorithm needs to perform more than just a few operations per byte of key data to achieve a good hash distribution. Sure, you can choose a piss poor algorithm and it will be very fast but then you will end up with tons of collisions and no matter how you handle collisions, there's always a slow down involved (either when searching for hits or when verifying that there s not hit) and often also additional memory consumption.
If you want some real world data, just write the same OO code once in C++ and once in Objective-C and benchmark. In C++ methods are called by fetching pointers from vtables, so all calls are indirect but they are just indexed table lookups at runtime. In Objective-C functions are called by name and method tables are hashtables. So every time you call a method, the method name is hashed and a hashtable lookup is performed to fetch the function pointer. The performance difference is huge; it's still huge, even though the Obj-C runtime has an internal cache and will not really perform a hashtable lookup for the last ten (IIRC) methods called (it would be way worse without that cache; at least calling the same methods over and over again in a loop is dramatically sped up that way).
It's not like anything in this video wouldn't be common knowledge for 50 years and it's not like other programming languages and runtimes wouldn't have tried that in the past. And in the past, machines had way less memory, memory access was even slower than today (as CPUs had little to no cache), yet there is a reason why we ended up with the languages and runtime that we use today and the reason was not to waste more memory and have slower code for no benefit. It's not like today's programming languages are modeled according to how CPUs work, in fact it's the other way round, today's CPUs are modeled to the needs of existing code and designed around the question "What can we do to make that code run faster".
@NotherPleb 4 місяці тому ⁺¹⁷
@@xcoder1122 Well that's a lot of theory. In practice, I benchmarked the HashMap and BTreeMap from 10 to 1 million of Monsters in intervals of power 10 with a 1000 random accesses and here's the result: by the size of 100 Monsters the performance is evens, from that point onwards the HashMap remains constant and the BTreeMap keeps growing as expected. The threshold size 100 doesn't seem as much as you implied. But as I said, this is still very fast for both and almost meaningless in the majority of cases, I'd go with HashMap for a general purpose and scales effortlessly.
@xcoder1122 4 місяці тому
@@NotherPleb "I benchmarked the HashMap and BTreeMap" and these are two very specific implementation with tons of optimizations but also assumption about the data and not necessarily easy on CPU or memory (e.g. they were probably not optimized with embedded systems in mind). Neither one was available to my kernel code. And performance depends the data and how likely that data is causing collisions. But you are missing the actual point. The actual point is, how much slower is HashMap vs storing the data in band. Because storing data out of band to save 4 KB is a ridiculous on systems that have 16 GB of RAM and cannot really make use of that RAM mots of the time to begin with. In development you always make trade offs between speed and memory and wasting 4 KB, even wasting 4 MB is fine, if that gives way more speed. The argument of the video was, that by saving memory, you will even gain speed. And in general there is some truth to that statement but when you replace direct access with hashtables, there are only very limited specific cases where this holds true for real. It's not like game developers would not try that first before deciding to rather waste more memory instead and use direct access when this gives consistently better frame rates.
@David_Raab 4 місяці тому ⁺⁴
I have seen many people argue like you. Thinking a HashMap is slow, and key-generation would be slow, so they stick to searching through an Array in either O(n) time or adding stuff to an array with insertion sort O(N) or sorting which takes O(n * log n) time and than doing binary search with O(log n) time, just to avoid O(1) lookup access because obviously O(1) must be bad performance. I cannot express the amont of annoyingness I get whenever i read crap like this.
Yes it doesn't say about the actual cost and also could be O(100), but usually we store many things in a data-structure so picking a data-structure that scales better for like 1-10 items and goes worse for more than 10 elements is just silly. That's also why we use measurement like O(1), btw. before that generalization people usually calculated the real costs of every operation. But this still didn't make picking algorithms and data-structure easier. That's why Computer Science today just speaks of O(1) and it's good to have it this way.
Saying that for small N another data-structure could be better also has no point. If your N are so small and you have like 10 items in it, then this data-structure will probably never be part of any kind of critical performance code section. A HashMap usually already becomes faster below the N < 100 mark.
It's very simple, if you need to lookup something by a key, then use a HashMap, otherwise don't use it. Most of the time when i have seen slow code is because of that, people wanna lookup something by a key and they scan an array and do excessive sorting of an array, thinking they are smart. No they aren't. Just use a HashMap, Key-generation is not slow and constant lookup-time most of the time is really really really good. No kind of crappy array implementation people come up with does outperforme this.
@olhoTron 4 місяці тому ⁺⁶⁶
Meanwhile, web developers send 30mb over the network just to show a simple text blog
@SamualN 2 місяці тому ⁺¹¹
not all of us. the day I'm sending more than 300KiB of JS for a simple web app is the day I need to be taken out back
@Ariccio123 2 місяці тому ⁺⁷
@@SamualNI'm glad you exist. As an occasional web developer, I'm constantly disgusted by what I work with.
@rusi6219 Місяць тому
@@SamualNbased
@masterchief1520 18 днів тому
Not funny or original or true. You must be a webdev.
@wegvS Місяць тому
I‘m a web developer and will probably never use any of this stuff but it’s super interesting!
@TheOnlyFeanor 3 місяці тому ⁺⁴
The example at 20:00 doubles the number of memory accesses for each struct (one with the animation id, another one for the monster type), unless the reduction in footprint makes it all fit into a single cache line.
@jaredcramsie182 3 місяці тому ⁺²
I was going to comment this as well. Splitting the data like this will always require two cache-reads on the initial load *. If the out-of-band technique is applied to more data, then each one will incur an additional cache-read. When looking at an individual access, it is like it comes with a guaranteed cache-miss for each time you apply the technique.
Data for arrays are stored contiguously in memory. In this example, if you expect to access several elements in a row, then much of the data will already be in cache - because the size of each struct independently is less than one cache line in size.
If you expect to go through all the data in any order (and it is small enough to fit in cache), then this results in less cache-misses total, because there are simply less cache-lines to access.
These benefits can artificially inflate the performance savings of the technique when people benchmark it by iterating over an entire array of data, or test it in a limited context that doesn't factor in the accumulation of external data in cache from other processes, if the data / procedure is not used often.
* For completeness, there are some special cases where you don't have two cache-reads:
One cache read: I assume this is what you were referring to in your comment. If you happen to be storing less than 64 bytes of total monster data, 7 or fewer monsters in this example (assuming the 'animation' data is cache-aligned to start with and the 'kind' data is not). Then you can fit 3 extra monsters into the cache-line, where it could only store 4 monsters in a cache-line previously.
Three or four cache-reads: In this example his data is a divisor of the cache-line length, if it wasn't then there may be a chance of having to perform another cache-read for each access.
@Norfirio 3 місяці тому ⁺³
This technique is very useful for instances where you do multiple operations on the Monster and you only need one (or a few) fields at a time. For example, in a game you will have an update function and a render function. If you iterate with the padded struct, you waste time going over the anim field in the update function and then waste time going over the kind field in the render function. You just have to be careful of your use case, as with any optimization methods.
@Oler-yx7xj 3 місяці тому ⁺¹⁵
6:50 using light speed as a measure of how fast an operation executes is funny. It also makes you understand how fast modern computers are: having memory as fast as L1 cache 1 meter away from the CPU is literally impossible
@rallokkcaz 4 місяці тому ⁺³
Thank you for uploading this again, this is great talk.
@Lircking 4 місяці тому ⁺³
watched this on vimeo previously, great talk!
@pajeetsingh 4 місяці тому
Rare. Last time I opened vimeo was in 2012.
@MatthisDayer 4 місяці тому
It was only avaliable on vimeo for many months before
@cariyaputta 4 місяці тому ⁺²³
Learned something from this talk. So thanks.
@StarContract 4 місяці тому
Learned something from this comment. So thanks (I guess).
@Godson7278 4 місяці тому
@@StarContract learn something to comments thanks gay or guy whatever.
@RobertGardnerEngineer 2 місяці тому ⁺²
"I programmed my compiler like an ECS based video game and it dramatically improved performance" was not the talk I expected to listen to today. That is a cool take.
It makes me wonder if there is value in a language where this is how it stores things and processes under the hood. Like, can you write a language where ECS is not *a* paradigm, it's *the* paradigm.
@murrayKorir 4 місяці тому ⁺⁴³
Just added 100 more fps to my renderer after watching this.
@FlanPoirot 4 місяці тому ⁺³³
my toast toasted 1000x faster after watching this video
data oriented sandwich 👍
@prezadent1 4 місяці тому ⁺⁹
@@FlanPoirot You toast bread, not toast.
@FlanPoirot 4 місяці тому ⁺¹⁶
@@prezadent1 I apologize for my grave mistake, milord
@i-am-the-slime 4 місяці тому ⁺²
@@prezadent1no. Americans just don't know what actual bread is
@prezadent1 4 місяці тому ⁺²
@@i-am-the-slime You just made two logical fallacies in 1 sentence. Congratulations.
@SumoCumLoudly 4 місяці тому ⁺⁴
what an advantage in life being put onto coding at such a young age
@thewhitefalcon8539 4 місяці тому
good for your career, very bad for your s-x life
@Eng_Simoes 3 місяці тому ⁺¹
An advantage in coding
@GabrielxC 2 місяці тому ⁺¹
Yeah, I started 65 years old, too late 😢
@rt1517 3 місяці тому ⁺¹³
In my opinion, this approach has an impact on code simplicity and maintainability.
On one side you have a structure that contains what you are interested in.
On the other side, well, it is more complicated. And I would even say more bug prone et harder to debug.
Also, he focuses only on the size of the data, while the way you access the data is also very important.
For example, it is supposed to be way faster to visit all the elements of an array than to visit all elements of a linked list.
Because with the array, the memory access are very close to each other so you stay in the cache and the processor can even predict what memory address will be accessed next.
With a linked list, you are accessing all over the place so most of the time the next item won't be in the cache.
Don't get me wrong, it is very nice that Andrew is trying to make the compiler fast. And there are a lot of slow compiler out there.
But I am not sure that many of us mortals need that level of optimization.
@Nors2Ka 3 місяці тому ⁺⁶
I would say Andrew misunderstood the meaning of DoD and what he showcased here is straight up optimization - a step most mortals do not have to take as their code is 1000x slower for other reasons entirely, i.e. using pointers from a linked list to refer to individually heal allocated items.
There is also this funny thing where he optimized the most irrelevant part of the compiler, LLVM will gobble 98% of the compile time regardless (unless Zig can use it's own fast backends for debug builds).
But I do want to say that the way you refer to "readability" sounds like you don't know what you're talking about and the "readable" code you would spew out would be anything but actually readable.
@rt1517 3 місяці тому ⁺⁴
@@Nors2Ka For me, readability is about how easy it is to understand the code. There is an infinite way of implementing a feature in a program, and some of them are better than others according to some criteria, for example readability or performances.
Regarding readability, I would judge it by the amount of time an effort an average developer will need to understand a peace of code.
At 22:29, on the left, I instantly know and understand what is a Monster and what it is capable of from its struct definition.
But on the right, If I look at the Monster struct, I don't know that it can held items.
And if I find the held_items hashmap, I am not sure of the key of the hashmap. Is it the Monster index in the Monster list? Is the hashmap dedicated to the Monsters or is is shared with the NPCs or other possible item holders? Should I clean something in the hashmap when a Monster is deleted?
For me, the left version seems a lot more easier to maintain and add features on (for example, looting).
@Average-Lizard 3 місяці тому ⁺⁴
I feel the same way.
I know in my typical projects (large web and console apps), the code being easily read and maintained is much more valuable than saving KB of memory or a fraction of the time of a single web request.
I guess these kinds of optimizations are for specific use cases, but none the less it’s interesting to hear about and have in the tool belt.
@toby9999 23 дні тому
I think so. I've spent the last 20 years working on high-performance software where these techniques make the difference between being the best or lagging behind. For those who're developing web front ends or bloatware, or coding in java or Python, etc. not so much, if at all, in my opinion.
@sporefergieboy10 4 місяці тому ⁺⁵⁰
ZIGGERS RISE UP
@JoffeDall 4 місяці тому
*Ziggas
@soniablanche5672 4 місяці тому ⁺²⁴
hard R
@i-am-linja 4 місяці тому ⁺⁶
The term is "ziguana".
@turolretar 4 місяці тому ⁺⁵
How it do my zigga?
@glowiever 3 місяці тому
LMAAOOOO
@xcoder1122 4 місяці тому ⁺¹³
15:50 Keep in mind that to access an object, you must now perform two memory accesses: First a lookup with the index into the pointer table to get the actual pointer, then another one for fetching the object from memory from the pointer address. So you saved memory but for the price of doubling the amount of memory accesses. And while the lookup table may be used frequently and thus be likely in some CPU cache, it also wastes a lot of space in that cache that would otherwise be used to the actual data you want to access.
@monad_tcp 4 місяці тому ⁺¹³
The amount of memory access doesn't matter that much, the amount of data transferred is what matters, the CPU has a lot of parallel hardware to actually do the transfer when the instructions are being processed and the CPU itself is waiting for the data to come from the memory.
This is old theoretical point of view in computing science, this idea that you amount of memory accessed or multiplications matter, or even instruction counting. This might have worked for a PDP11, but not for modern computers.
Why are we still so fixed in trying to compute asymptotic in abstract sense when it bares no relation to actual hardware. Sometimes computing science is not a science, ironically, in the scientific method you start with observation, not with hypothesis.
For ex, in all of the sorting theory, you can mostly ignore the amount of compares, that literally doesn't matter and has almost constant cost, assume N=1 because you're going to spend 90% of the time waiting for data, so only memory access matter, but not the count in operations, but the total in bytes, you can pretty much assume a memory access is also M=1.
The only thing that really matters is total amount of bytes transferred, instructions don't matter much, except they have a size and need to be also transferred.
I think textbooks need to be updated, they really do.
@lankin9217 4 місяці тому ⁺¹⁶
He mentions the handle as an index into an array of objects, not an array of pointers. The pointers are only necessary if you heap allocate the objects. The point of trying to make your objects as small as reasonably possible is to make it easier to store arrays of objects and not arrays of pointers. That's also where locality benefits come from.
@diadetediotedio6918 4 місяці тому ⁺¹
@@monad_tcp
"in the scientific method you start with observation, not with hypothesis."
Because computer science is closer to mathematics (which is considered a science, and you don't "start with observations" [usually]) than to particle physics. The basis in which you are now spitting as having "no correlation to reality" are much larger than you and me.
@IndellableHatesHandles 4 місяці тому ⁺¹
This could also be the case with the approach at 19:41. The three arrays will almost certainly be far away from eachother in memory, meaning that you'll have a much higher chance of multiple cache misses to reconstruct one monster.
@attilatorok5767 4 місяці тому ⁺³
@@diadetediotedio6918 I don't see how mathematics doesn't start with observation. I think a majority of math starts with finding odd connections and finding explanations for them or formalizing them. If you take set theory as an example (this is a good example, because it's the basis for most/all math), it started with the observation of "things can be categorized/grouped together". Category theory as another example (also a very fundamental theory), started with the observation of "a lot of mathematical systems have similar properties".
@avwie132 4 місяці тому ⁺⁴
His backstory is 1:1 my backstory… I feel very unique
@notapplicable7292 4 місяці тому ⁺³
Ngl i assmed for a moment this was a presentation to department of defense
@ThePOVKitchen 4 місяці тому ⁺¹⁰⁹
ron this is not notes app, im begging you
@Plorpoise 4 місяці тому ⁺⁶⁸
I don't know, I think Ron has a point. Maybe it is a notes app
1. Stored in the cloud and syncs across devices (for free!)
2. You can make community posts (notes not tied to a topic)
3. You can post comments on a video (notes on a specific topic)
4. You can even include timestamps for specific citations to refer to
5. Separate comments let you break up your thoughts across a video.
I think Ron has convinced me that youtube is a notes app. And a pretty good one. You just happen to be able to see others notes as well (did someone say "collaborative"?)
@adamhenriksson6007 4 місяці тому ⁺¹
😂😂😂😂
@greyshopleskin2315 4 місяці тому ⁺⁶
@@PlorpoiseYou have just convinced me. :o
No, seriously, this makes perfect sense.
@Zaniahiononzenbei 4 місяці тому
@@Plorpoisehave you seen my favorite social media network, Google Maps? It has posts, you can follow your friends and share your location. It has reviews, likes, and you can even use it for "content" discovery.
@ClayMurray8 4 місяці тому ⁺²
@@Plorpoise If youtube has an api too get all your comments you have a _really_ great point.
@_mrgrak 4 місяці тому ⁺⁵
great talk, but at 20:00 andrew is talking about memory layout and how 7 bytes are wasted - assuming the enum is a byte! in many languages an enum defaults to an int, which would be 4 bytes. imo, the enum backing type should be shown, just for clarity.
@greasedweasel8087 3 місяці тому ⁺¹
But even if the tag is a u64 or such, those 7 bytes are still “wasted” in the sense that they don’t provide any information. They will always be 0, since the tag will only take values from 0 to 4 it never touches those more significant bits. In practice the exact type of the tag doesn’t matter because it will never take values high enough to get use out of those bytes.
@jotch_7627 3 місяці тому
also what languages are you using that automatically and truly use 32 bits for an enum tag lol
@_mrgrak 3 місяці тому
@@jotch_7627 well theres this isa called x86
@_mrgrak 3 місяці тому
@@jotch_7627 the isa would determine word size, not the language. languages can target many isas.
thanks for playing
@greasedweasel8087 3 місяці тому
@@jotch_7627 I mean rust uses an isize by default which is ssize_t in C aka a pointer sized integer so it’s not that far out of the realm of possibility, although 1. You can easily change this and 2. That’s only for debug builds, when optimisations are turned on it immediately collapses to the smallest size possible
@tigranrostomyan9231 4 місяці тому ⁺²⁰⁰
bro realised that procedural is better than OOP at 14 years old age
@Krbydav328 4 місяці тому ⁺¹⁸
bro really actually realized frfr
@diadetediotedio6918 3 місяці тому ⁺²⁰
It is not necessarily better, only generally faster.
@yash1152 3 місяці тому
shut up
@OdinRu1es 3 місяці тому ⁺²⁶
Each paradigm has its own use case. Use the right tool for the right job and mix them
@toby9999 2 місяці тому ⁺⁵
He didn't realize it, but he might believe it. There is no best. Which is the best carpentry tool? Depends on the job in hand, right? A power saw isn't great at driving nails. OOP is great at some things, but it sucks at others, as does generic programming and functional programming.
@ktappdev 3 місяці тому ⁺⁶
When he made alive_monsters and dead_monsters I was smiling so big my gf started investigating me
@torgnyandersson403 3 місяці тому ⁺¹
About the lexer and retokenizing. That's unnecessary work for strings and other tokens where the length isn't the same for all tokens of that type. Instead, just have one token for "begin string literal" and one for "end string literal". That's what I've done in my own projects. You get tiny token struct and no retokenizing.
@ZarviroffSerge Місяць тому
Brilliant!
@Fedor_Dokuchaev_Color 3 місяці тому ⁺⁹
Bit fields? You don't have to use multiple arrays and enums which make the code less readable.
struct Monster {
unsigned int life_points: 63
unsigned int is_alive: 1
};
That's it. You can pack a lot of data if you know what range of values is maximum/minimum in each case.
@ratakor Місяць тому ⁺¹
Or in this case just check if life_points is lower than 0, but this is a very simple example. For the zig compiler I think that Andrew's strategy is good.
@sadhlife 3 місяці тому ⁺³
16:03 zig has them now
@El-Burrito 4 місяці тому ⁺⁴
This is so over my head but Data Oriented design seems super important for performance
@PassifloraCerulea Місяць тому ⁺¹
It's most helpful when you have many of the same sort of thing to process together at the same time. A lot of programs don't work that way, actually, but games often do. It's also pretty incompatible with classic object-oriented programming with inheritance and pointers to objects of different sub-classes. It's neat stuff but a pretty advanced concept, so don't worry too much about it if you're just starting out. You'll get there in time.
@yash1152 3 місяці тому
30:10 the plug i was looking for
@monad_tcp 4 місяці тому ⁺¹
2:23 that's not the wrong order, its the perfect order. basic -> perl -> python -> c++
@Turalcar 4 місяці тому ⁺⁵
6:38 I just this week did an optimization that replaced a few cached values with recalculation.
@pointer333 4 місяці тому ⁺¹
This is gold!
@AhmadAli-kv2ho Місяць тому ⁺¹
can someone correct me,im new to this: is he say ing that its 1+4+3+8+3+4+1 which adds up to 24? or is it not 1 and 3 for new lines?
@sachahjkl 4 місяці тому ⁺²
Exceptionally good talk, congrats
@ENCRYPTaBIT 4 місяці тому ⁺⁴
I just learned about memoizing damn it 😅
@MaddieM4 3 місяці тому ⁺²
It's still useful in many situations! The rule of thumb is that the bigger the memoized object is, or more expensive it is to create, the more you'd benefit from keeping it cached somehow. Something accessed over the network, for example, you probably want to have some degree of local caching. Just be thoughtful that any time there's caching, you'll need to pay attention to expiration and invalidation.
@b1zzler 17 днів тому
This is all great, but I can't help but think of how much more painful it would be to extend and refactor code like this without breaking everything.
@yairlevi8469 4 місяці тому ⁺⁵
21:06 doesn't this make more memory accesses a requirement? if you want the full struct of data, say of element 0, you'll need to access not a continuous sequence of bytes in memory, however padded, but 2 (kind and monster pointer) from different places in memory, since theyre stored not side by side.
@NeunEinser 4 місяці тому
Wouldn't this be true regardless? If you lookup animation, you'd need to do + 0 and + 8 for kind, and in the changed version you do and .
Now, for struct base address in the first example, this is a relative lookup based on the struct array pointer, and in the second example the lookup to both animation and kind are also relative offsets based on their respective base pointers.
The struct of arrays should also just be part of the stack, just like the array pointer in the original.
So I don't see any difference here?
@yairlevi8469 4 місяці тому
@@NeunEinser in the first example the struct is contiguous in memory and in the second example its spread out, since the two arrays are most likely spread out in memory, so you are required to access twice in different places rather than a single point and from there read the entire contiguous block. Unless I'm missing something
@thewhitefalcon8539 4 місяці тому
but normally you don't want the whole struct
@NeunEinser 3 місяці тому ⁺¹
@@yairlevi8469 Yeah, that is true. If you are iterating over all structs, and you have enough cache lines for all fields, this is probably better though. You will have more of the data in the cache at the same time than if all was a single array.
If you access more fields than you have cache lines, this would perform a lot worse though.
That's why you shouldn't blindly adapt any kind of concept anywhere, and always run benchmarks to see if your changes actually help.
@qy9MC 3 місяці тому ⁺¹
So (Dod) not only saves memory but also makes programs faster !? I'm in !
@whywhatwherein 4 місяці тому
ahah storing tokens without the end position... that's a new one for me! interesting
@pawello87 2 місяці тому
I have doubts whether the monster example is relevant in the context of DOD. Let's say I want to update the position of all bees. I would have to iterate through the whole set and use an if condition to pull out only the instances I am interested in. That is still clogging the cache with data I am not interested in (all non-bee monsters). In the DOD approach this would be split into two (or more) separate systems, which would automatically eliminate the need for encoding/tagging.
@aetherclouds1181 4 місяці тому ⁺³
28:30 I thought we were padding structs to have the size be a multiple of the natural alignment? if the tag enum is 1 byte, wouldn't a bee or a human be 4 + 4 + 4 + 1 (enum) + 3 (pad) = 16 bytes (excluding the HumanClothed data)?
@srnissen_ 4 місяці тому ⁺¹
huh it appears to have eaten my reply.
shorter answer then: no, because there's no padding, because the struct is split up into parts that each get their own array - one is 12 bytes (no padding needed) and the other is 1 byte (no padding needed) for 13 bytes per bee. If you copy a bee out of the multiarraylist, it gets reassembled into a single struct with padding, size 16.
@El-Burrito 4 місяці тому
Yeah that part lost me too
@taimenwong9765 4 місяці тому ⁺¹
rewatch 19:05 and you'll get the answer
@aetherclouds1181 4 місяці тому
@@srnissen_ you're right, I just noticed he uses a MultiArrayList, and that is a SoA, I was just a bit confused by the zig syntax (like how you can define struct members that aren't really in memory like Monster.Common or Monster.HumanClothed). thanks!
@r_j_p_ 8 днів тому
Basically write your code in Fortran.
@nicolas393 2 місяці тому
great talk
@pajeetsingh 4 місяці тому ⁺²
Any burnout here just checking the video?
@lunarmagpie4305 4 місяці тому ⁺¹
This talk was awesome!
@michaeln.8185 3 місяці тому
Great talk! I look forward to using some of these strategies. Open question: 18:21 In languages with compilers that support #pragma pack, is there any advantage to storing booleans out-of-band? Assuming non-sparsity of the boolean field would you still not incur cache line misses regardless of whether the bool is in the structure or an array outside, since the cpu is forced to load the same # of bytes?
@PassifloraCerulea Місяць тому
The whole point of storing struct fields at their natural alignment is so the CPU doesn't have to do extra work. Back in the 90s, top-of-the-line RISC CPUs might even crash your program or trigger some OS interrupt to handle unaligned access. I believe x86 tends to be more forgiving about this, but I'm not sure. It is likely that your compiler will go to great lengths to avoid unaligned memory access, using multiple smaller reads or writes instead. As always, looking at assembly output and benchmarking your particular program will be necessary.
@tordjarv3802 3 місяці тому
Damn, I should learn Zig
@minastaros 3 місяці тому ⁺¹
Hello, bitfields?
@dreanwave 22 дні тому
Throughout the whole talk I was thinking: What about bit fields?
@zxuiji 3 місяці тому ⁺¹
3:09 nice :)
@maximenadeau9453 4 місяці тому ⁺²
I think I'm on a new curve too after this video.
@JimTheScientist 4 місяці тому ⁺¹
why think when i can just write attribute packed on everything
@salim444 4 місяці тому ⁺⁷
wasn't this uploaded on vimeo?
@eggmeister6641 14 днів тому
Where can I access the slides from this talk?
@afterschool2594 4 місяці тому
I am not a really low level programmer. But I am beginning to take interest recently. One of my biggest question is what is alignment and padding? Is it the same thing? And why do we need them?
@maximknyazkin662 4 місяці тому ⁺⁴
I am not a specialist, but as I understand it. Alignment is related to the fact that it is easier for the CPU to read and operate on data if it is positioned on the multiples of 4 bytes and each piece of our data is spaced equally from each other. So this is alignment. Padding means specially added spacing between pieces of data if they do not align properly by themselves. Like for boolean. If it is 1 byte and we need it to go with something that is 4 bytes, we need to add 3 bytes of padding to make them equally sized. This way the CPU can read it faster as it does not need to take varying sizes into the account
@mss664 4 місяці тому
@@maximknyazkin662 Not a multiple of 4 bytes. It's faster to read when the value is naturally aligned. So a byte-sized value can be stored at any address, two bytes (u16, short) should be placed only at even addresses, 4 bytes (u32, int) at addresses divisible by four, etc. Cache lines are aligned too, usually to 64 bytes, and if the values were not aligned like this, it's likely that the value lies on the cache line boundary, and the CPU would have to fetch two cache lines to load one value. I haven't tested it on smaller types, but on my machine loading memory aligned to 16 bytes into SSE registers was about twice as fast as when it was unaligned.
Also reading data from unaligned pointers isn't even valid on all architectures. In C and C++ dereferencing an unaligned pointer is illegal. Some CPU architectures might do the expected thing, some might throw an exception, some might just ignore the unaligned bits and read memory to the left of the actual pointer value.
@LNTutorialsNL 4 місяці тому ⁺¹
Can someone link the video he mentioned by the guy in the red shirt?
@ChimiChanga1337 4 місяці тому ⁺²
ua-cam.com/video/rX0ItVEVjHc/v-deo.html - Mike Acton DoD talk
@xmorse 4 місяці тому ⁺³⁵
@ron5948 is going crazy with this video
@100timezcooler 4 місяці тому ⁺⁴
slurring his words n shit
@robrick9361 4 місяці тому
@@100timezcooler Is he using a voice to text software? Is that why his comments have random misspelled words
@tacticalassaultanteater9678 3 місяці тому
Right, I guess my intuition about performance can go out the window now, because all of those steps sound incredibly slow
@policyprogrammer 3 місяці тому ⁺³
Avail yourself of bitfields. They're part of that standard and they can really save your bacon space-wise.
In particular, I think they're better than encoding multiple Boolean values into an enum because of maintainability. Imagine you have these encoded enums, and you have to write a function like is_human, and it's working just fine but then someone adds a new variable and re-encodes all the enums so that you have to include some more, but you forgot to update the function.
Another advantage of bitfields is you can take advantage of your knowledge of your actual use of the size of types. Like if you know you need a 24-bit, You can just allocate 24 bits, can you the next eight for something else.
@anonymousperson420 23 дні тому
Worst thing about the internet, it makes me feel old. This guy started on the sega genesis? My goodness that came out ages after my first game.
@yash1152 3 місяці тому
20:27 the speaker image covered the content.
and i dont understand this example too either.
@yash1152 3 місяці тому
20:52 ohhhw, its about language feature where the lang allows to create array of each member ef struct.
@salim444 4 місяці тому ⁺²
can someone explain what he meant at 36:32 about storing one token index? I am kinda lost at this point 😅 how this comes into practice
4 місяці тому ⁺²
He was storing the start and end indices of the token, ie: where the token starts and ends in the file. You only need to store the start index, because it's trivial to calculate the end index (end_index = start_index + token_length)
@salim444 4 місяці тому
I understood that, I actually meant the second bullet point where he is talking about multiple tokens and to find the index of other token we only need to "poke around" the tokens array
@curtis6919 4 місяці тому
@@salim444 He's just saying that instead of storing the line and column you store the token index, which is an index into the outputted array from the tokeniser (the previous stage in the compiler). The token immediately after any token n will always be n+1, the one before n-1, and so on--and so you can simply recalculate the token's context without necessarily storing them in the AstNode struct.
@MrSentientSimian 4 місяці тому
What he means is that if your code is `if(a==1)`, then all you need is to store the location of `if` because you can assume the next token is `(`. So you don't need to store the token index for `(`, you already know it's the next one in the token array after `if`.
Hope that helps
@salim444 4 місяці тому
@@MrSentientSimian thanks!
@ron5948 4 місяці тому ⁺¹⁸
You kinda want to go all the way and not store heap allocated things at all. Put everything in a hugepage sized chunk of memory 2MB+ makes swapping and ssd flsuhing easier and muuch faster (bypasses most if the ssd firmeare) and then use very small array indexing cache lines not single data elements. This eay u can use a circular buffer of vm mapped hugepages where mmu allocates physical memory for you. Ye then do arena based eviction and copying to clear these pages and reuse them for new data. Containers then become strategies to align data contained into pages and cachelines, direct memory pointers are not data bc cannot be serialized anyway so ideally you avoid them completely. Cpus can use the cache line u ron on as a base pointer and then do relative lookups memload
@ron5948 4 місяці тому ⁺²
The boolran thing is maor about set membership the aliveness of an object is depending on your dokain model not relevant peoperty of the object itself and u want instead have a set for alive monsters or dead one full of integer offsets. Integer sets are hugely compressible, so this tricks are highly dependent on the data usage, if you know the queries beforehand you can make a query optimizer decide the structs in q , in other words its too low level for the programmer to choose dsta chunking by having static (read type ssystem level) structs, instead these emergr and change depending on the querirs and functions you ron on dats haiajaajgaikjsiik😂❤
@ron5948 4 місяці тому
Categories are particularly painful with static set of attributes named like in a struct. Human may have very different attributes form animal but share libe ess doing inheritence or even strucuteal sharuing unuseful and slower pergormance anyway, refactorings immense painful anyway fsr
@Maurdekye 4 місяці тому
what
@JimTheScientist 4 місяці тому ⁺³
fortunately for me i’ve been writing my kernel and didn’t want to write a kmalloc so I pretend silly high level concepts like the heap don’t even exist. also why even bother aligning anything if you can just attribute packed it all (i joke)
@ron5948 4 місяці тому
@@JimTheScientist itd not high level you [forced self censorship], thsts the problem try to be less abrassive ron
The underlying principle I am trying to convey here is thst thr primitives we use map to 1980s hardeare in a bad e way bc primitive algos and heuristics, DO NOT MAKE SENSE ON MODERN HARDWARE, so imprdancr mismatch horrivle performacne, complexity explosioon
99‰ morons with h1b visas genrrsting design patterns gsnf od for abusing shitr java code nobody thinking csn tolerate, ye so that kind, you misunderstood, i provoked misunderstanding, but correct response would be to ask for clarification
@titaniumtomato7247 4 місяці тому ⁺²
Me I'm a mere mortal and very unexperienced. I'm at 20:09 now and I'm wondering something.
Let's say you do the thing where you split your monsters into the two arraylists alive/dead. What if you add a third state like deactivated? Later you might add more.
At that point you could add a third or fourth list. But maybe they are very sparse some of them. Is that a problem? And what if the states get more complicated where your game logic needs to check if some conditions are true but not others. In the end you make a bitfield and the whole state becomes an int. That might happen right. It feels like separating into separate lists might be a premature optimization that you regret later. Is that true?
(EDIT: I guess it was more of a stepping stone to the MultiArrayList idea whoopsie, more like premature raising of concerns hehe)
Also how can you efficiently move monsters between the living and dead lists. I remember when iterating through a list in java you could not remove elements. Is that just a problem that turns out to be worth it since you read/write to a list element many more times than you are likely to remove it?
@soniablanche5672 4 місяці тому ⁺²
In a real game, don't bother with this optimization. saving memory in the kilobytes is meaningless in current year.
@Mark-wz9uh 4 місяці тому ⁺⁹
@@soniablanche5672 It's not about space/memory savings, but performance.
I would agree usually don't bother, but in a hot loop this is one way to make it fit better in cache (which depending on the problem and size, can give you multiple time speedups).
@curtis6919 4 місяці тому ⁺⁷
@@soniablanche5672 Did you even watch the video?
@srnissen_ 4 місяці тому ⁺⁶
I'm not quite sure how Java works but if it's like other languages I'm familiar with, you cannot move elements during foreach due to underlying implementation issues with that idea, but you can absolutely move them if you're not doing foreach.
If I was doing this in C++, I might simply have a single array with all monsters, dead or alive, and an index for "first dead monster" - every time a monster dies, I swap it with the last living monster in the array and decrement the index of "first dead" (for idiomatic reasons, I'd rather store "first dead" than "last alive" but that's a detail)
Capital letters are alive, lowercase are dead.
ABCDE 5 - start. All are alive and index of "first dead" is actually past the end of the array
AECDb 4 - b died
ACDeb 3 - 'e' died
Now, when I want to iterate over my living monsters, I loop for (i=0;i
@jhacklack 3 місяці тому
These are great questions to ask.
"What if you add a third state like deactivated? Later you might add more."
You profile the distribution of monsters in each state and and adjust your arrays if needed, the more common status type gets a smaller memory footprint by being the default branch. Less common types are accessed through more logic but that's not a problem because logic is fast and by definition it wont be used much in comparison.
"And what if the states get more complicated where your game logic needs to check if some conditions are true but not others. [...] Is that true?"
Pretty much, the exact optimization depends on taking measurements of your actual code. The underlying philosophy of DoD is that you have your data and can use all the assumptions about your data that you know to be true, instead of pretending that all the data is arbitrary and unkowable and defensively programming around that. You just look at the data and the profiling and match the code to it.
"Also how can you efficiently move monsters between the living and dead lists. "
Depends how much time the CPU is waiting between frames, generally the CPU finishes it's job fast and most of the frame is spent waiting for the GPU to draw everything (obviously depending on the graphical fiedlity of the game) so you should always have time to set a flag that a certain monster needs to change state. Now whether you immediately change it's state or whether you wait until more monsters need a change state so you can do it all in one batch, depends on your profiling from before. If monsters don't die often then it might be better to have the CPU clean up the array at the end of frame by writing the monster's data to a buffer, nulling it in the array, and then copying from the buffer to the dead array. Or it might be better to batch move lots of monsters at once. It really depends on the type of game which the developer will obviously know about.
@100timezcooler 4 місяці тому ⁺¹
Sick nice cool. (dont ask me to write a poem)
@Marco-sz6mq 2 місяці тому
Nice talk. I am just wondering if someone could suggest me any book that goes more in detail for hardware related stuff, maybe to better understand compilers and low level feature. For example, like the slide at 5:20
@ChimiChanga1337 2 місяці тому ⁺¹
Patterson and Henessy - comp arch
Dragon book - compiler
@Marco-sz6mq 2 місяці тому
@@ChimiChanga1337thank you
@drsensor 4 місяці тому ⁺³
you should put a link where original video located so the traffic is also directed to their website. just mentioning it's copyright by Handmade Seattle is not enough
@petevenuti7355 Місяць тому
How do i get a system to treat a hard drive or flash drive AS main memory, fool it into thinking the storage device is RAM, then treat the ram as l2 cache?
Can it be done with a driver or does it have to be done in the firmware?
@rangeramg Місяць тому
isn't this just swapping/paging? why would you want to do this because it will be insanely slow
@petevenuti7355 Місяць тому
@@rangeramg no it absolutely would not be swapping paging, I am trying to avoid paging all together and just use a block device directly AS RAM!
I'm trying to both bypass motherboard limits on supported ram, and make programs believe I have more physical RAM then I would even have room to hold the page file for when swapping!
Asking if it was the same thing makes me wonder if you watched the video?
Sorry for the sarcasm but I am tired of people repeating that same response when im explicitly saying that's not what I want to do, it's very frustrating.
@rangeramg Місяць тому
@@petevenuti7355 if that's the case i think the only way to accomplish that is through modifying the operating system itself, not sure if l2 is even able to be changed (i doubt it since it's inbuilt into the cpu itself).
the system that enables paging essentially treats it like extra memory in the address space, so not really sure what else you want to do other than that? you can just allocate a whole device as swap space and use it as memory in linux, so there's that, and you can probably limit useable memory to almost nothing to emulate what you want to achieve
and yes i did watch the video in full
@petevenuti7355 Місяць тому
@@rangeramg yeah I'm not sure how the mmu on the motherboard and the memory management of the operating system work together or are capable of overriding each other. I was hoping somebody would know. I remember really old systems where you could put RAM on a card on the bus.
As for the actual l2 cash you are most likely right as it's in the CPU chip, however if the operating system has anything to do with it, such that's not just microcode on the CPU that handles it, it would be cool if any real ram that was available could be used in that role.
@michaelotu9723 4 місяці тому
why is the size 24 and later 16 when the struct variables were rearranged?
@johnnm3207 4 місяці тому ⁺¹
because of how computers align things in memory to prevent gaps in memory. The size decrease is a result of removing the extra padding needed to keep the struct aligned
@Clowndoe 4 місяці тому ⁺¹
A struct needs to begin and end on a multiple of its largest member. The first u32 takes 4 byes. The u64 starts 4 bytes after that, because it needs to start and end on a multiple of 8, leaving a 4 byte space. Then the u32 takes a further 4 bytes, and the compiler gives the whole struct another 4 empties to make an even 8 useless bytes.
The second version takes the last u32 and puts it in that space, making it all nice and neat.
@amjadaltaleb7559 4 місяці тому
There is a despicable amount of ads in this video
@ChimiChanga1337 4 місяці тому ⁺⁷
Channel is not monetized therefore the video is also not monetized. I think youtube just runs ads on all videos irrespective of their monetization status.
@styleisaweapon 4 місяці тому ⁺⁷
22:00 that hash is going to cause a near-universal cache miss just to see if the monster holds anything
@Jan-h7q 4 місяці тому ⁺⁶
Not necessarily: if only a handful of monsters have inventory, the hashtable will be small and it will be in the L1 cache.
@styleisaweapon 4 місяці тому
@@Jan-h7q thats not how that works
@lassipulkkinen273 4 місяці тому ⁺³
To be fair, an out-of-order cpu should be able to do the next hash calculation and subsequently queue the next read from the hash table while waiting for one read to complete, assuming no branch misses, anyway. So I wouldn't expect the cache misses to be too frequent, even though I find the supposed performance benefit of the hash table here rather dubious.
@styleisaweapon 4 місяці тому
@@lassipulkkinen273 you'd be right if the difference between missing and hitting was only one order of magnitude or so .. but its not .. missing doesnt get hidden by other latency on anything made in the past 10 years .. instead the latency of everything else is hidden by the miss
@Tomyb15 4 місяці тому ⁺³
Before even watching, I'm gonna predict that the solution is basically using an ECS or similar techniques to make objects of arrays of data instead of arrays of objects.
@i-am-the-slime 4 місяці тому
Can somebody tell me what memory alignment is?
@maximknyazkin662 4 місяці тому
In short as I understand it. It is faster for CPU to read and operate on data if it is equally spaced from each other (alighned) because it doesn't need to take the varying sizes into account. So if you have something which is 4 bytes in size and the other is 8, you need to add 4 bytes of extra spacing (padding) to the 4-byte thing, so that the distance between the beginnings is equal. I guess then CPU just knows that it has to read 2 pieces of 8 bytes and goes on reading instead of calculating that first it is 4 than 8
@JimTheScientist 4 місяці тому
your registers are fixed sizes (8 bits, 16 bits, 32 bits, 64 bits, and more with xmm avx), and you want to do one operation to move all the bits from memory into your register.
Think about how you would connect memory and a physical register, you probably wouldn’t want to move 4 bytes of data one by one by incrementing your addresses (which selects the cells in ram containing your data), you could just wire it so that selecting one address hardwires all the bytes starting at the selected memory address to your register. If you align with binary, like say you want 4 addresses, then starting at address 0100 will let you get 4 addresses 01XX. If you started at 0101 then you have to increment to 1000.
@diadetediotedio6918 4 місяці тому ⁺³
Yet, this does not appear to me to be the most problematic performance problem of modern software. The most problematic thing in modern software appears to be the extensive reliance in IO (which is orders of magnitude slower than RAM access) and what I call "browser-bloating" (which is, using browsers and dynamic languages with excessively slow performance to make bigger ones). But DoD is certainly interesting on itself, good talk.
@BoletusDeluxe 3 місяці тому
DOD is used extensively in game development and is still being expanded on
@diadetediotedio6918 3 місяці тому
@@BoletusDeluxe
I did not denied that, this is not part of my point. I'm talking about "modern software" and most of the existing software is some web application, server or GUI or something inbetween. Games are absolutely an important part of it, but this here is arguing for DOD for general software development, and my questioning is about wheter or not OOP and other not so efficient programming paradigms are really to blame or if these people are making an unfair comparison where it matters most for them. My point is that, in the verge, no matter if you are using the most performant language or paradigm, the significance of your saved miliseconds will be already munched by that database call, that file read/write, that network request, etc (not saying there could be no improvements, but that you will only get heavy diminishing returns of it).
Games also are a very special and specific category of software, they do run every single frame a bunch of software logic, UI software, web software and even servers "profit" from being lazy and reactive, they only process data when some request is made.
@BoletusDeluxe 3 місяці тому ⁺¹
@@diadetediotedio6918 true, i suppose the issue of modern software extends far further than just architecture, but there is a lot that can happen culturally from embracing efficiency and performance.
@msmeraglia 4 місяці тому
this is my exact journey haha
@Izopropilen 28 днів тому
so after all this stand-ups Andrew Kelley sitting on stream talking about how Zig has so many bugs because its compiler is slow. DoD didn't help?
@AhmadAli-kv2ho Місяць тому
34.55
@Kenbomp 3 місяці тому ⁺¹
Nice but lisp coders knew this for decades McCarthy
@bobweiram6321 4 місяці тому
Ada solved these issues back in the 80s. Types sizes are specified in bits instead of bytes.
@styleisaweapon 4 місяці тому ⁺³
No modern cpu has any single bit registers. bool is a wholly imaginary datatype, and as such its up to the compiler to decide how to store it, how to work with it, etc. Contrast that with things like int and long, which are defined (or intentionally left undefined) so that they behave exactly the same as the registers of all modern processors... and no, the flags register isnt a single bit, although a few instructions behave differently depending on the state of single bits within it. The languages that define true to !false leading to true = -1 makes a lot of sense given this.
@khatdubell Місяць тому ⁺³
The most important take away from this is that he does't know what OOP is.
@channel_channelson Місяць тому ⁺¹
Exactly, real communism has never been tried
@theinsane102 4 місяці тому ⁺⁴
In twenty years there will be videos called "the problems with Data oriented Design", "reasons to avoid Data oriented design". The programming world moves in hype cycles around new methods. One time OOP was all the rave.
@Galakyllz 4 місяці тому
Yeah, any time there's a trade-off, you'll find a constant tug-of-war over the years. The strategies presented in the talk are great if you're ready to optimize, but it's now harder to maintain the code. As people migrate their code to optimize the bytes over the next decade there will be a resurgence of people arguing that maintainability is paramount.
@thewhitefalcon8539 4 місяці тому
And each time we hopefully get better
@BoletusDeluxe 3 місяці тому ⁺³
@@Galakyllzdata oriented design is not harder to maintain unless you only know how to maintain OOP code.
@Galakyllz 3 місяці тому
@@BoletusDeluxe That's what I mean: you will find coworkers that are not as capable as you, so changing the data structures in ways that reduce the memory required will just result in your coworkers making mistakes when adding new fields, then they become frustrated, then they avoid tickets involving that part of the project altogether. Now you are the only person maintaining that part of the project. Eventually your coworkers will say "Well BoletusDeluxe's changes a few months ago make that process unmaintainable. It's more efficient but it's harder to maintain, so that's why we haven't gotten around to adding all of these other features."
You might think "well, the less-experienced coworkers will just get fired/replaced then", but that is painfully unlikely from my experience. You become the anomaly, the senior developer that no one is as capable as, so they don't try to hold everyone to your standard. If they wanted that, they'd only ever higher senior engineers. So to allow the entire team to work on that project, they revert your changes.
@rusi6219 Місяць тому ⁺¹
@@BoletusDeluxe"maintaining" OOP code = duct taping bleeding wounds
@dj10schannel 4 місяці тому
🤔
@plaidpunk 4 місяці тому
Shit this is too high level for me right now but I can tell it's going to be useful so Im just gonna save it.
Really really wish I could just find a tutorial on programming C/Assembly instead of like how to turn on a fucking IDE.
@coreygossman6243 4 місяці тому
You'll either have to learn command line or an IDE, and if you learn one you'll have to learn the other lmao
@tbird-z1r 3 місяці тому
So awkward seeing a nerd swear when doing a speech.
@ron5948 4 місяці тому ⁺¹
If you have lotsa data like the humans example you want to use a EATV approach then get the atts in a set as abvoided ron ahjaiaha

Наступне

Автоматичне відтворення

CppCon 2014: Mike Acton "Data-Oriented Design and C++"