References for easy googling: "Data-Oriented Design and C++", Mike Acton, CppCon 2014 "Pitfalls of Object Oriented Programming", Tony Albrecht "Introduction to Data-Oriented Design", Daniel Collin "Data-Oriented Design", Richard Fabian "Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)", Noel Llopis "OOP != classes, but may == DOD", Lane Roathe "Data Oriented Design Resources", Daniele Bartolini
@@isodoubIet is there another source/book you'd recommend instead? Started reading it recently after watching Mike Acton's & Andrew Kelley's talks on DOD
Whoever lined the addbreak up for the end of the talk before the questions, well done and thank you. It was wonderful to get through this talk uninterupted.
In my experience, it's also perfectly acceptable for users to click and wait for things. Having to wait for the computer to process a few hundred structs is a tell tale sign that something is very wrong. Worst case - I took over one project where the running time was 55 seconds for a simple script. After rework, I got it down to 0.012 seconds, most of which was spent actually transferring data between servers. The rewrite took a few hours and was trivial (for me) to do, but saved tons of resources. The earlier solution was to throw hardware at the problem, which is both stupid and expensive. :P
Lots of OOP fans are being triggered here, lol. Basically, it boils down to memory access patterns. If you can arrange your computations to simply "flow", best without hiccups, i.e. without branching and chasing pointers all over the place, and if you can put data being accessed by hot code in the same place, so they can be prefetched in the cache, you win big, performance-wise. That is what DOD is all about: to better match program data to the way hardware operates. Now, in cold code, OOP is just fine: it dramatically increases programmer efficiency at the cost of runtime efficiency, a cost we're willing to pay. But in hot code, HPC code, and real-time code, DOD is vastly superior, as it much better matches the problem to the hardware.
I'm a DDD OOP oriented person and couldn't agree more. Even die hard DDD (domain driven design) books mention "while design may abstract models of the world, they should never forget technology being used to implement it". Even in cold applications you can make a whole lot of damage completely ignoring technological limitations.
Programming for games and with that perspective in mind, I think most low level game programmers agree with an dknow this. Most games use a lot of OOD, but for the most performance critical sections we use more DOD. It does not really have to be all or nothing. Like one super simple and common example that everyone can understand. Writing particle systems, there is no way one would write them as objects with functions. The data for the particles will definitely be stored separately from the functions and separated to only include what is needed for the code being executed. Ex, not storing visual stuff with things for the simulation step and so on. And that is how everything can be done. Many parts of the engine can be OO, but the parts that are easy or really hot, can be DO. Both have their strong points. It's not like most people that are for DOD tries to make for example a really complex enemy with AI that only runs for 10 instances at a time to remove all the structure and objects. But maybe the line of sight, audio propagation and path finding can actually be DOD without any drawbacks on the ease of use or flexibility of the rst of the logics.
Also, as one more additional note, in most cases if one want to really take advantage of broad SIMD vectorizing, it is really easy to do with tight loops of DOD, and much harder with OOD. I have worked on engines that can use OOD with animation updating and manage different versions of functions to take advantage of different instruction sets, but in the end most of it will be structured in tight arrays at that time anyway, so it will be kind of similar to DOD. And then it's more like arguing about semantics. I would really be curious if there is anyone arguing for OOD and thinks otherwise here.
I agree with the performance argument. However, those who argue against OOP often use an OOP "example" like in this talk, which is not a representative example of OOP. If a class inherits from 7 other classes instead of *using* instances of these classes, then the programmer failed using OOP. Also, well designed OOP code has fewer if statements. e.g. Instead of checking whether I'm in a crawl, walk, run, or idle animation, I just have an object for each animation that knows how to animate and interpolate itself. So the example given is a bit of a strawman and I understand that OOP fans are triggered because they rightfully feel misrepresented.
oop is very, very rarely a valid way to write code. most oo code out there is basically people coming up with new models of real world concepts that have nothing to do with the real world to begin with and just translate to you having to understand the original concept AND the model the person to write the code came up with. oop is broken at a fundamental level as a programming paradigm, performance just happens to be another sector where it sucks. oop is only fine when it's not used as a paradigm, rather as just another tool to solve a very narrow class of problems. the *actual* prorgamming paradigms are procedural, functional, etc. that don't spiral down to an incoherent mess. that said, this conversation is already too high iq for most people. arguing with people who think no oop means no classes, constructors, destructors, polymorphism, encapsulation, namespacing is basically just roleplaying as sisyphus. when you've only been introduced to all these concepts exclusively from within the context of oop, it’s quite natural that you get defensive and piss your pants when people ask the right questions.
The more I research about Data-Oriented Design, the more I believe Object-Oriented projects have just been saved by great hardware. It's just a luxurious way of programming that's completely detached from the actual behavior of the machine. Your program might be object-oriented but your machine isn't.
It's not always so luxurious though. One of the conceptual problems I see even if we're trying to code to our intuitions of the real world, and not the hardware, is that OOP doesn't remotely begin to reflect the nuances of the real world. Objects in the world don't offer functionality based on what they "are". They offer it based on what they "have". A human isn't able to walk because he's a biped, he's able to walk because he has functioning legs (and not all human beings have those). A duck isn't able to fly because it's a bird, it's able to fly because it has wings just like an airplane, which isn't a bird. And unlike an airplane, a duck can also walk, because it has both wings and legs. And in the real-world, the behaviors and available functionality of things change on the fly. For example, a human able to walk today may not be able to tomorrow, because he got a serious injury and lost his legs. OOP wants to model such interfaces in a static, unchanging way, not ways that can be dynamically changed on the fly. An ECS doesn't just fix the problem of OO being completely detached from the actual behavior of the machine. It also fixes the problem of OO being completely detached from the actual behavior of the real world. It solves both problems that OO has: the problems for humans who want to program things intuitively that capture the more nuanced complexities of the real world in their simulations of it, and the problems for humans who want to program things intuitively that are harmonious with the nature of the underlying hardware.
Without the profound advances in semi-conductors people wouldn't be enjoying all the python & js paradise they have been for so many years... Back in the old days of the 80's and 90's it was the standard to drive behavior with data since you had so much little hardware capabilities to work with! It has been clear to me that with the passage of time, we have been getting lazier just for the sake of convencience.
@@darkengine5931 If you think about it deep enough, memories or information or data that we store in our brains, enables functions (systems) that would otherwise not be active since the data (information or memory) was not present in the composition
Stoyan: "Study Chromium, it's made by the best engineers in the world, there's a lot to learn!" Also Stoyan: Shows how the best engineers in the world designed an overengineered system with poor performance.
Both statements are true believe it or not. They made a piece of software that does what it should and does it reasonably well. Speed was a secondary concern and the over engineering was unfortunately required. Try to make your own browser and you'll understand what he means there. There are some incredible performance tricks in chromium, it's not like they don't know how to program, there's just so much to keep track of.
@@nextlifeonearth and I think it is worth noting that most software of that scale has grown over a lot of time, and it's not like all of it was planned out perfectly from the start. And that is just how software engineering works. If the same people started from scratch to make a ne version today a lot would probably change, but a lot would also likely be similar. Code bases grow not just from wanted features, they grow also from being used, as a lot of legacy stuff needs to continue to work, multiple solutions for the same thing might co exist. And it's not like the engineers want to do that if they had a choice. It is "easy" in comparison to after the fact just take a part of it and make a new better version that just does that thing and without having to support all the legacy stuff.
Stoyan: "you're overengineering your systems and causing them to have poor performance!" Also Stoyan: "and here's how we built a browser rendering engine to make a game menu."
The speaker could have given a much better answer to that second question, which asked about what exactly he thinks should be "dead" in OOP. OOP was designed to help with program design, maintainability and reusability. Things like encapsulation and abstraction are key core concepts of OOP, and they were developed in order to aid in the design of huge programs. It's a tool to create a million-lines-of-code program in such a manner that it remains manageable, understandable and maintainable. When properly used, it makes code simpler, safer and easier to understand and to develop further. It also helps in making code reusable, meaning that it's relatively easy to take some functionality that could be used in another part of the same program, or even a completely different program, and use it there without much or any modification. This helps reducing code repetition and overall work. It also helps in stability and reducing the number of bugs, because when a piece of code has been extensively tested, you can be certain it won't be a problem when used in another place or program. OOP does a relatively good job at this. The problem with OOP is that it wasn't designed, at all, with low-level efficiency in mind. Heck, back when OOP was first developed as a paradigm computers didn't even have any caches, pipelines or anything like that. There were no "cache misses" because there were no caches. Memory access speed was pretty much constant no matter how you read or wrote there. The performance penalty from conditionals and branching wasn't such a big concern back then either. It was but decades later that processor architectures went into a direction where the scattering of all the data randomly in memory became an efficiency problem. Thus, if we want maximum efficiency in terms of cache usage, what needs to "die" in OOP is the concept and instinct of decentralizing the instantiation of the data that has a given role. The data needs to be taken out of classes, and centralized into arrays, and thus we need to break encapsulation and modularity in this manner. We also need to minimize branching, which in terms of OOP means minimizing the use of dynamic binding (in addition to try to minimize conditionals).
WarpRulez Granted, his title was obviously over the top, but his talk was excellent. The main point, at least what I got out of it, was that hardware matters because that’s what programs actually run on. You can’t abstract that away becausw you believe in some universal machine that runs on fairy magic dust. And when people mention that OOP is best suited for large systems, with millions of lines of code, I don’t know how to react because it’s simply not true, unless you cherry pick examples. No major operating system, each which is easily 10s of millions of lines of code , is written that way for the exact reasons (and more) the speaker mentions: you cannot ignore the hardware. And the closer you are to it, the more you realize it. It’s unbelievable the technological advances we’ve had in computing but my browser operates at the same speed (on the latest hardware) as it did 15 years. Even with c++, how long does it take for QT Creator to load? Or the PyCharm IDE? My OS loads faster! How is this at all possible when we’re continually improving on the hardware end? It is ridiculous. And if you think OOP is responsible for abstraction, I don’t know what to tell you pal.
I don't think that OOP makes code more maintainable or easy to reason about. OOP is both not optimal for performance AND not optimal for maintainability. I mean, most highly scalable (in terms of codebase size) and maintainable languages can't even do OOP (functional languages like Haskell for example, or Rust which prefers combosition over inheritance). Although, Rust might add inheritance someday (delegation RFC). Even in languages like Javascript people start to move away from OOP not due to performance (in those languages you often even sacrifice performance by not going the OOP route because they are optized for OOP) but because using more functional approach makes their code more maintainable (e.g. React, which now even wants to get rid of components and replace them with hooks and useState).
@@boltactionpiano7365 only because it uses object oriented concepts, doesn't mean it ignores hardware you can map objects pretty well to some things and be performance (e.g. directories), but I think a pretty big performance killer in OO is pretty often inheritance (which includes Interfaces as understood by Java)
A breath of fresh air. Every time I try to read the source of an open source project written in C++, I find that, for even the simplest thing, I end up having to hunt down and read a few dozen different member functions in quite a few distinct classes (in distinct files). Object-orientitis I call it. Though I find the arguments here are more against excessive abstraction and splitting things into an excessive number of objects pointed to.
"[OOP and DOD] are just tools in your toolbox" - this is good advice! Unfortunately, many developers treat OOP (and TDD, and...) almost like a religion; not as a tool, but as a rigid set of beliefs that must be adhered to at all times, lest you evoke the anger of The Prophets; Fowler and Uncle Bob, hallowed be their names. But OOP is just a tool. Go explore, have fun, learn new things, and expand your toolbox. You'll see that hammer-oriented carpentry is limited :-)
Yes. There are plenty of times where OOP are great but other times it is overkill or not the fastest method. Use the right method for the problem you are trying to solve instead of Trying to push different methods as a replacement they should teach them as additions to a set of programming methods. That is why I like php and the flexibility it offers. You can go OOP or Procedural you can use functions, methods, classes or modular programming methodologies. I think it would be cool if php-gtk had taken off to create more client side php programs.
I believe GoF's Flyweight is exactly what this talk is about. The part of the problem with OOP lies in how we approach OO design and how we teach it. Beeing a teacher myself, I always encounter students who have this Animal and Dog and Cat style of OO design, so starting with very beginning we have this very naive mindset how to model the world in the software. Like Animation class in Chromium. My take is to see OO more like system/API level thing, more like modules in Oberon or Service in Spring. Here is where OOP is really shines
i am learning OO too, and i am taught the same way as Animal | Cat, Dog . glad to know that this is not good approach. but also, i don't understand what is the alternate approach. can u please explain a bit?
Game engines typically don't struggle with duplication of heavy data though if I understood correctly about the citation of flyweights. That's a trivial problem to solve for commercial gamedevs to avoid duplicating hefty data like textures and meshes and strings and store indices/references/pointers to them instead (interned strings in the case of large strings); there's no need to even call it "flyweight" and generally don't require associative structures or any fancy solutions to do so. The efficiency demands would make it blatantly wasteful to do so. The bigger and more serious problem for critical real-time applications like games with OO is encapsulation and a scalar type of way of thinking about objects. Encapsulation is thoroughly antithetical to efficiency. For example, an OO designer will generally design a BLAS 3-vector like this: template class Vector3 { public: ... private: T x{}, y{}, z{}; }; In a system where the critical transformation loops are sequential in nature, this is thoroughly counter-productive from an efficiency standpoint as those loops will benefit most from SIMD processing data in the form XXXXXXX in 256-bit YMM registers, not XYZXYZXY. Interleaving data like this which isn't optimal to be accessed together in an AoS fashion as OO generally wants to do wants to be split apart (hot/cold field splitting) unless all our objects are hefty multi-containers of data. Yet they don't want to be split apart to maintain intuitive encapsulated designs. Also splitting apart these objects later is enormously costly even in response to all profiler hotspots pointing to this as the ultimate bottleneck killing framerates, since it's a design change to a public interface and will break every single thing that uses that design. Another very serious problem is that object-oriented programming often wants to create abstractions that hide away data treating them as implementation details (ex: an image abstraction which hides away its internal pixel format), and the most critical loops often require the data to be leaked in a leakier abstraction by treating data as a fundamental part of the public interface. They want "data structures" (leaky abstractions with "open data"), not "objects" (sealed abstractions with "closed data"). Efficiency actually demands leaky abstractions, not good abstractions as judged by OO ideals such as SOLID or even the more basic principles of information hiding. Hiding information is the worst possible thing we can do from a performance standpoint, and yet doing so is at the heart of object-oriented design.
@@darkengine5931 OO does not require hiding information, encapsulation is not a strict requirement of OOP, and honestly encapsulation should only ever be used to protect application state from faulty manipulation when you can afford to in the first place, like a Vector3 has no excuse having encapsulation for the internal data, usually this be the case of higher level functionality and interfaces where CPU caching either doesn't determine the performance or in cases where correcting the CPU cache misses wouldn't have corrected the performance issues in the first place, which is usually larger interface points. Not that OOP itself requires that you design in a manner that doesn't respect DO design, that's just the tradition foolishness people are taught early on as being "smart" even though it rarely ever is. Composition should be your first design, inheritance should always be your last.
Use the toaster-analogy! Bread -> Toaster -> Henry Bread stores the slices, and the toaster changes the amount of toast in each slice and eventually passes them on to Henry that eats them. ;)
I find it funny that for every one of these talks, there is always someone trying to make a jab along the lines of "if you use data-oriented design, then after ten years of data bloat you have to do a lot of hard work to keep the code running as fast", as if that is a downside of data-oriented design. Well, duh. Of course it's hard work. The same bloat happens in object-oriented code as well, but there it's too difficult to see past all the classes and indirection, so most people just give up and accept the slowdown as "inevitable".
Not to forget the objects and inheritance will all become dependent on eachother. When the purpose of the program changes, not if, the design will need to be changed and unless you start from scratch, these dependencies will be reused with a lot of overhead, often because the programmers only know that if you remove it, it stops compiling somehow. Data oriented will let you start from scratch right away, but various code paths are reusable with minimal modification as opposed to OOP, where the classes are purpose built.
We gotta think why data oriented design matters at all. It is because the memory speeds are cripplingly slow and we got caches as the hacky solution. If someone invented faster memory nobody would even care about data oriented design.
This is _exactly_ what has happened over the years. Code gets slower every year directly because of the bad ideas from OOP, and nobody knows how to fix it. Data/CPU oriented design is hard only because doing something the right way happens to be hard most of the time. If it's too hard for traditional OOP programmers to do their job correctly _without_ OOP, maybe they can try serving at McDonald's? I, for one, wouldn't mind if OOP programmers all quit and modern software could start competing with software speed of the 70s again.
@@totheknee an observation. The sorts of problems we are solving are several orders of magnitude greater that the problems we were solving in the 1970s. We are doing things now that were inconceivable 50 years ago.
It’s also just funny because it’s mostly an irrelevant concern. Why would I (as a dev gainfully employed at a tech company, let’s imagine) care if something takes work to maintain? The alternative universe where the code works well for eternity without overhauls or maintenance is one where I’m unemployed. Not to mention the fact that the bulk of commercial software revolves around releasing new features and updates and getting people to buy in over and over again. Do people expect that you’ll never need a rewrite in that kind of a market?
Compare how it works at start and how it works now it's like a dream... try some Escape From Tarkov and you will see what is bad optimalization and performance.
I'm now trying to use Data Oriented Design in a business apps where different aspects are moved into the components (in the sense of Game Entity Component Systems). For the reason that it is much easier to synchronize with remote systems and avoid fatal sync failures, lets see how it goes. While it is a nice talk i have not seen Data Oriented Programming outside the Games Industry. But on the other hand. A normalized inmemory relational database system is doing exactly what an ECS is supposed to do.
I'd like to see the code and cache hit data from the guy who asked the question at 51:05. I'm 99% sure his code is a prime example of what Nikolov is talking about.
It's a catchy title for sure :D but it's also kinda misleading, because it's not really a problem of OOP. The problem is the way that OOP is taught: It's often like "structure your class fields the exact same way as your properties/getters/setters", and that's bullsh*t. Instead you can simply embed the data oriented design *into* a class and then have the best of both worlds: cache locality and encapsulation. That way your class can have a convenient public interface and at the same time hide all the dirty secrets (like the actual data layout in memory) that it uses internally to achieve better cache locality. For me OOP is just a way to separate the internal data layout from the public interface. It's not OOP *or* DoD, it's OOP *and* DoD :)
In HPC data oriented design is the norm, it comes from the way people used to write programs in FORTRAN where not even structs were available. Scientific codes with a long history have high likelihood of being written by people who care about performance and know about the hardware.
Exactly, it's all about locality, and the same is true for embedded systems which get crazy heterogeneous but have biannual hardware release cycles instead of 4-5 years in HPC. It's like HPC with attention deficite disorder. "Look! A new instruction set! Pack it to the other seven SoC components! Add another on-chip interconnect! Use MPI to offload from Cortex to M4!" Those idiosyncratic EVMs are perfect for evaluating hardware abstraction, though!
@@tobiasfuchs7016 reason and good sense will prevail in the end, a bit sad it took 20+ years and 15+ years of memory wall to realize and start to popularize certain basic concepts.
I see it very right that author mentioned that OOP is much more applicable under some circumstances than DoD. And I would like to expand this point with a bit of personal thoughts. I'm sorry that I've done a poor job to structure them well. To my mind, It's too early to kill OOP as some comments below propose. Personally, I see the best suite for OOP in a classical enterprise bussiness-oriented programms where dynamic polymorphism is not just that OOP thing we use only because of its existence but which plays a crucial role in building highly maintainable architectures. For example, I mean the Dependency Inversion Principle (D in SOLID) that allows the flow of control and the flow of dependency to run in the opposite directions. I can't clearly see an application of DoD on business architectures. DoD does demand you that you know your domain very well in advance which is not usually possible. In classical OOP you can separate your business-logic that seldom changes into a separate component and provide it with a plenty of interfaces to decouple it from the details. You can experiment with the details as much as you want leaving the core of your application logic unchanged. And moreover, that inversion of dependency allows you to get rid of even transitive dependecies on the details which can end up in ability to compile and deploy your business logic separately. In DoD you concentrate on the data more than on the behaviour which is not always the right way of designing some systems. Another significant downside of DoD I see is the lack of context in your data structures. Encapsulation does a good thing in a matter that you give others programmers not only the info about the data you but also the hint into how your data is usually used. Moreover, in classic OOP your are allowed to add some restrictions on the internal data of the object usage. Imagine directly accessing and modifying the std::vector's raw data pointers. Of course, obsessive encapsulation can lead to bloating your objects with a bunch of methods whose logic belongs to different parts of the system. But that is an obvious violation of Single Responsibility and Interface Segregation Principles. In that case, weakening the restrictions of data accessing of the object and moving the odd logic to the corresponding subsystem would be applicable. That leads to my final point. Residing in the middle of those design paradigms usually is the best practice. I'd support my point with a personal example. Recently I had an opportunity to apply DoD on a game I'm currently working on as a pet-project to organize my gameobjects. However, I really liked the ECS pattern, I didn't want to restrict myself with putting the logic only in the systems. That's why I added all the necessary virtual methods to components. That's allowed me to use an Entity-Component and Entity-Component-System patterns together. And now some components that better know what they need and how to act, like Player's, Enemy's components, have all the logic packed with them while other components that are a part of some more complex systems, like Physics' colliders and rigidBodies, just hold the data. To my mind, that's taking the best from two worlds which gives me a plenty of flexibility without the restrictions of the particular design philosophy. Even though such components technically don't differ as they both have virtual functions that are called every tick, I can separate them into different classes that can be stored in different arrays and, moreover, introduce some custom memory allocators to imrove data locality and reduce cache misses. As you can see, there is a plenty of optimizations that can be added on demand.
Holy crap, I did data-oriented design in 2008, and I had no idea until today that this was a thing. I just felt like an outsider coding in a way that I thought made much more sense an performed better than all my peers.
Hardware Efficiency Oriented, is what programming should be IMO. It's the extra code and data added by the compilers, that will never be used, the problem is: bloats RAM ♈ and Cache$, and that happens with all programs, DLL and other executables; instead of a well defined Execution Requirement List, that can be better than a simple Import List. It'd be not only much faster, but much smaller, saving also Operating System Admin time 🎉
Hah all the OOP fans are triggered. What he argues against is the philosophy of using large, virtual inheritance trees for every problem -- especially low-level ones. Great talk. People need to chill about the title and the comments here are too harsh
This is a really good talk. The folks saying he's not fair to OOP have just not seen that the juice is worth the squeeze. We have all been lied to. OOP is not the only tool, and the others happen to be better and simpler. I've been using a procedural and data oriented approach since about 2011 and it has never let me down. It has gained performance, added readability, made the code more composable, and made it easier to test. OOP is an unfortunate detour on the way to enlightenment.
I guess a lot of it may simply be called "hot / cold split", both in terms of members within a class but also with what are considered "active / inactive" objects of the same type being split in different containers so inactive ones are not getting updates / callbacks
I think that lacks a widely-used name (unless you get very specific, like frustum culling which does that to avoid overdrawing meshes outside of a the viewing frustum) but it's generally not something that is so costly to do in hindsight as needed in response to hospots. Hot/cold field splitting is a fundamental design consideration upfront, since it affects the entire way we design data types and data structures. If we try to do it later rather than sooner, it can be an extremely costly design change.
Great talk, but something is bugging me. The talk is about performance and it that aspect he's missing something. It's about performance... of games written in an interpreted language run in a tab of a program that has to do 100 other things as well. This notion of the browser being the environment is which everything has to run is what makes modern computers feel as slow as the ones we had 30 years ago. If you want high performance animated games, why are you even considering javascript/css to be an option?
Seriously? These concepts have been around for decades. The problem is not a lack of books about this new fad called data oriented design, the problem is the amount of garbage (IMHO) around about OOP. I'm referring to resources that preach terrible (IMHO) design practices. And even if they had their place, remember: horses for courses. Different jobs demand different tools. When was the last time you saw a performance or optimization chapter in a book about OOP? The thing is, what he wrote is OOP. Hence why the questions. How? Let's say you have a big system and you're designing one of the data crunching (heavy lifting) components. Here, we have an animation engine. It could be an SMT solver, whatever. The OOP way is to publish an interface and hide the implementation. Like you have here. It's nobody's business how you internally manage animation's state (if you don't care about performance, it might be all pretty, like a naive example from a book; if you're dead serious about performance, it can get ugly fast - all the more reason to hide it). Pre-OOP way would be to expose everything (nowhere to hide). Your state would be just laying around for everybody to see. Meaning, they can write code that's dependent on your implementation. That's terrible. OOP gives you this "black box" approach, but it says nothing about what should be a black box. Or how far you should go in your modeling of a problem domain. Containers are a good example of OOP. Or do you think it would be better to expose the internals? Just because a lot of people write terrible OOP code doesn't mean OOP is bad. Just like with C; a lot of bad C code floating around doesn't make C bad. In another words, if the code is terrible, it's your (author's) fault. If it performs terrible, the same applies. Tools don't make a good programmer. A good programmer can make any tool shine. It's a well understood fact that the more you care about performance, the less freedom you have. You do what you have to, not what would be neat. Take data structures. There are a lot of interesting structures about. However, if you really go for performance, you'll probably find yourself using just a handful of general purpose structures. And you might have to implement even those because the standard library doesn't really offer them. As far as his critique, I'm not really sure about all of his examples. I'm not familiar with the codebase, I'll just go with what he said. Take the if (!timeline_). They're iterating over animations needing an update. So, this is either a sanity check (which means it should always fail, unless there is an error somewhere), or it's a part of the removal process when animation stops (which means it should succeed once, at the end). I have to ask myself, is this really a significant source of misprediction? Or the if (!content_). I believe he said it contains the definition of an animation. How can you have an animation without any definition? I would expect this to always succeed. Again, is this a real problem or are they just covering their bases (to avoid dereferencing a null pointer in case there is an error somewhere)? Just two examples that stuck in my mind.
Just to add, tight coupling is not a feature of OOP. It's a fact of life that components in a system often need to cooperate. There are quite a few ways to do so. And the convenient way is not necessarily the best way, unsurprisingly. Keeping a pointer and calling directly (well, more likely indirectly ;-) ) with the data you "have at hand" is convenient compared to storing all that information so it can be delt with later (whether by you or by someone else). If you care about keeping your data and instructions hot, your hand is forced. It's not too dissimilar from what humans do - getting in the zone, doing tasks in batches. Or you might want to quickly shoot a message to another thread on another core to keep it fed and happily crunching away while you do your business (you might look at this as a production line with people working in parallel on different tasks, passing work pieces around). This is not about OOP. It's about minding performance. Interface design is in general challenging. And one thing OOP does is increasing the number of interfaces. If you want to reap the benefits of OOP, you need good interfaces inside your code. Good interface facilitates reusability. Interface can tie your hands when it comes to implementation. Etc.
Origami Bulldoser That's a good point - this was more about how to design code to operate to the particulars of CPU caches. To the extent this is advantageous to do it is kind of indicative that modern CPU architecture is not be well served by our general purpose programming languages. Seems like what our programming languages do and what is most optimal for CPUs just becomes a wider and wider gulf
It is like database design. Large disk latency means you need to make things fetch in the next needed data as sequentially as possible. Moore's law hasn't kept up for memory latency and it is more and more like disk over time (though the latency is pretty constant except in NUMA).
Because not every machine has a cache, you use the same design scheme as you would have when programming an Atari, data oriented design, all programming, is about transforming a set of data into another set of data, and it says that's what we should be focusing on instead of pretending we can plan how a system should be structured best before actually testing it, in oop you preplan the structure you have, dod you test the simplest implementation and let the system tell you how best it wants its data laid out, it says if the first thing you need to do is add 2 + 3 it says dont pre suppose the problem by creating a class that handles any possible case for the left and any possible case for the right, you just add 2 + 3 and then see how your system handles it and you move on to the next problem you need to solve being as efficient as you can be by just solving the problem in front of you, letting the data you get back from your system guide you in your design. It's much more than the cache, it's very reactionary instead of putting your ideas of how a system should be made, at the forefront, its constant testing letting the machine tell you what it's like best. You're thinking about what the gpu likes how long things take from disk you use that on how to structure and layout your data.
@@captlazerhawk absolutely, but the comment said cache-oriented programming and I wanted to address why it wouldn't be that, and how the data takes center stage even when performance is your main concern. It's just harder to see the difference between OOP and DOD when you start arguing about maintainability to someone coming from an OOP background. To me DOD is about succinct precision and efficiency, best exemplified in it's performance rewards. While OOP is about power, scale, that take more and more power, and have codebases that balloon out of control.
good but incomplete. That's a good fit about code, but alongside you should also "architect your data to be read and written in consecutive blocks of memory (as much as reasonably possible)". You can do your code summary even with OOP design of data (all data belonging to object encapsulated in the same class/struct), but that will still hurt performance if you use thousands of such objects and you don't rearrange the data per their use pattern.
It really isn't, but it's a good rule of thumb to just generally improve performance. I recommend you read Richard Fabian's book on DoD, (available free online).
@@younghsiang2509 Oh they did. Cache locality is a reoccurring topic in C++ conferences, like in "std::vector vs std::map". That Mike Acton talk is way too overrated by some. I didn't learn anything new from that one.
@@Gawagu If you "didn't learn anything new" from the Mike Acton talk, you either know more than Scott Meyers or you didn't pay attention. Apologies if the former is the case, and I'd be very interested to see the L2 cache miss rates on your projects.
@@minRef My L2 cache miss rate is pretty low in the relevant parts :D Is there a transcript or summary of his talk somewhere? When I hear his name I also always think of Casey Muratori and Jonathan Blow who have many good ideas and are often right but are also extremely arrogant and often full of bullshit with their irrational hate of C++ and such. My point was that cache locality and "data oriented design" isn't that new and a common topic on C++ talks even before Mike Acton made his talk. I found his slides I think: macton.smugmug.com/Other/2008-07-15-by-Eye-Fi/n-xmKDH/i-BrHWXdJ Most of that is a typical "well depends". His critique only applies to really performance critical code and even there only under specific circumstances. In general I often have to balance between getting something done with suboptimal code and only optimizing actual critical parts. Also their critique of references is pretty stupid. After going through his slides I really would like to tear some of his code apart to point out "how many bad design choices" he can do in a few lines and point out how "typical C" that is.
People defending OOP sound like dogmatists who do not accept ANY criticism of OOP. For them its always the fault of practitioners and not the idea itself.
Good talk, but obligatory warning people about the fundamental flaw of comparing a custom in house animation software for a fairly narrow use case (in-game UIs) to a full fledged web browser. Simply using DoD over OOP isn't going to increase your performance by 4 times. Their in house browser is most likely heavily hardware accelerated while chrome's render isn't, unless using WebGL. Further more the browser they are running is executing trusted code, so there is less concerns about security, non-standard extensions, and more.
I think this new engine is 6x faster than the old in-house engine, not the chrome browser default engine . Using the same Algorithm with DOD concepts in mind
@@isodoubIet Probably. webgl is pretty quick. So is code compiled to JS to run in a browser. And today one would compile to WASM for even more speed. Sure that may not be as fast as compiling to native code but it can be quite enough. As far as I can tell the point is one does not use an entire browser engine to do a game UI just the useful parts of it.
Pretty much. Whats funny s that Object Oriented and "Data Oriented" aren't even conflicting ideas. OOP/OOD doesn't mean virtual functions and massive inheritance hierarchies. It's gonna be funny in 10 years time when people take DOD to its extreme and we end up back where we started but this time DOD is the bad guy. DOD still has some great ideas though. Hopefully we can integrate these ideas without repeating the 90s.
@@jonathan_cline They are conflicting ideas. OOP causes every field on the same cacheline as the one your accessing to get pulled into memory. This coupled with calling different member functions (even slower if they're virtual functions since more memory needs to be pulled in to check which virtual to call) means even the instructions cache is constantly getting tossed out. The CPU ends up spending more time waiting for memory than actually doing work. DOD says to use a single function on a linear array ONLY CONTAINING DATA that the function actual uses. All modern CPUs are great at prefetching memory when accessed in a predictable pattern. So while processing the data on the 1st cachelines, the next cacheline will be prefetched and available for work as soon as the CPU is done with the first, leading to no wasted cycles. OOP says to put the data you need in an object to fulfill what the object represents. This leads to wasted cycles as unneeded memory gets pulled into memory anytime any field is used.
Actually Data-Oriented-Design is just plan old Procedural Programming. But I guess we need to invent another description for it because OO developers had thrashed talked procedural programming for so long.
The major point data oriented design preachers make is that giant classes with many logic and unrelated data cause slowing. And that's true, but the solution is already in OOP - its literally letter S in solid principles - small single responsibility clases. No need to call it new programming paradigm, its already in OOP
If you use the SOLID principle for OOP, you can make minor changes without being familiar with the entire system. This is especially useful when you are trying to modify a system developed a few years ago.
Unless you overdo SOLID and then you have a FizzBuzz solution that has one class for calculating text for multiples of 3, one for multiples of 5 and one that multiple inherits from those two.
The sequence of updating the animations is probably irrelevant, so you can swap the to be deleted element with the last one and pop it from the back, then push back to the end of the inactive array. Also you would organize the data in such a way that you only decide to move around between active/inactive once per frame and would try to do that update for all elements of one animation type.
Still the same array and memory block, skipping one animation is still much faster than fetching a list of characters with name, inventory, textures, meshes and animations just to use two of these to update the animation. In a remove/enplace/update action these can also be reordered, so you'd get their miss only one frame and the misses don't build up over time.
I assume you can just run thru the arrays and toss every entry on the respective array before doing the processing, with non temporal writes you bypassing the cache and the cpu itself will figure out when it should do the writes. So as long the objects are not huge it should be pretty fast.
I agreed with this talk, but using type erased base classes with virtual methods does not require allocation, and the vtable lookup overhead can be overshadowed by other latency in some cases, making it actually a viable solution even in performance-critical applications.
It is also important to consider that if it runs in a tight loop, the vtables are likely to be hot in modern systems with pretty large caches. Unless you have a ton of various of the functions... But it's not like I'm claiming it is always OK. Just saying that there are cases where the overhead of vtables truly is neglectable. And a lot of ways to go around the vtables and get the same extensibility has various other overheads. Also, it is possible to keep lists sorted on type. Ensuring that vtables stay as hot as possible.
Interesting talk on optimization. However, this stuff really isn't new. System level programmers/engineers had been obsessed with cpu/memory/cache/compiler friendly code design way before OOP existed.
Also, don't get baited by the title of the video. DOD and OOP are coexistent. OOP for production speed and code management, and DOD(along with other performance-oriented design patterns) for performance.
@@CreepyBio I'm no coding guru, I hope to be one day. But what I gathered from his talk is you want to lay out your entities by components to be used for systems. And you want those components to be laid out sequentially in memory. And the system using those components should be what defines what data is in the components. Is this a reasonable understanding as to what the presenter is speaking of?
@@skittles970 Didn't finish the full video, but my guess is that you're correct. The basic idea is to group data by "category" rather than by "object", since data of the same type are far more likely to be acccessed sequentially and this is beneficial for the nature of memory units' locality and prefetching.
I'd like to hear a talk about the deficiencies of OOP that doesn't blame OOP for the fact that the code is poorly written. Why do you have that hideous inheritance hierarchy and why do you have "a lot of unrelated data" in your objects?
Can't remember his last name, but his first name was Sean. He said, and I agree, that the problem isn't with objects, object programming is fine. The problem is the oriented, forcing the whole program to be built by objects and adhering dogmatically to the oop formula. Use the best tool for what you're trying to solve. I'll use oop when I want to create abstractions, but not everything needs to be abstracted. I sometimes use destructors cause I want to manage the lifetime of something like a tcp-connection. But I'll skip everything else oop.
@@Bozemoto Yeah and no professional software engineer with any sort of real world experience does that. Only academics who haven't written a line of production code in their life of juniors who are still learning to write code.
@@sacredgeometry Try finding the lecture on boosts http stuff. There seems to be a large range of quality in programmers on the market, across all ages and experience levels.
I don't really understand how the animation method that he shows works because to do animation in something like CSS you need to access some sort of variant, that means that you can only decide the type on runtime, but the template are compile-time generated types so you will end up needing some kind of branching like what is done with virtual classes....
There is a collection for each type of animation. And you run all collections which have atleast 1 active animation in them. Basically you have 1 branch for each animation type not for each animation.
Very often you encounter a case where the superset of all fields (data members) required of all the classes in a hierarchy aren't that many. In those cases, using a single class/struct with a superset of fields and a function pointer for polymorphism can dramatically outperform inheritance-based based solutions, since you can easily allocate and deallocate the objects in constant-time with a contiguous or mostly contiguous representation while eliminating the bulk of branch mispredictions and cache misses (you can store them all in one array, so to speak). It can require more memory with some wasted fields/data members, but fixing the size of each object/instance can more than make up for it with more better and more predictable performance (predictable/consistent frame rates are as important in games as fast frame rates). In other cases when the number of subclasses required aren't numerous or you have a generic solution and you do not need to access the objects in any specific order, you can do as mentioned above where you store each data type in its own container and avoid branching on a per-object level as with the case of using polymorphic base pointers. The branching is then reduced to once per type/container.
I'd have to disagree with the statement that "OOP is Dead". I've written code in both a procedural (struct) style as well as an OOP style and I can see pros and cons to both. Procedural style can produce faster code but you will find that as the project gets beyond a certain size it simply gets too difficult to keep track of all the concepts as they naturally start to get entangled with each other. This has happened to me first hand. So I guess if performance is critical and your solution can be kept fairly concise then a DoD approach may be best. For larger projects with more developers where performance is less critical OOP has to be the winner. Also, I agree that OOP solutions can be overly complex too ... many OOP developers (or more often software scientists these days) see creating the object model as an intellectual exercise without thinking about the CPU at all ... their object model may be a wonderful creation to them but can be tough to understand to someone coming to it afresh.
First of all, it's a good talk. And...I found a very interesting phenomenon of the software industry, re-inverting new words. From my point of view, data-driven is still OOP but without encapsulating the behaviors. I don't believe encapsulating behaviors to a class is NOT required by OOP design. It's all about ABSTRACTIONS. Inventing new words sometimes is not really helpful for teaching new comers of programming. I remember there are many talks in recent years of CppCon that we should make C++ or programming as general more friendly to the new comers.
@@PixelPulse168 You are right that you can implement data-driven using OOP paradigms from the language, but data-driven can also be implemented without OOP paradimgs. It's really something different, even if you could argue it's not in opposition with OOP.
In the general case, if two approaches are incompatible with one another then a third approach can be constructed that can adopt either of the two dependent on circumstances. If the two approaches each have advantages in certain situations, then the third approach will be superior if it can articulate what those situations are. So well done in recognizing the first part, but what to you determines whether an approach should be used?
May I ask a question here? How do you handle data dependency between different objects that form a hierarchy? For example, DOM is a hierarchy, regardless if we use OOP or DOD, and it imposes dependencies between different objects. Looking at slide 30 (27:33) - if `AnimationState` depends on `AnimationState` of its parent, how do you ensure that the latter is computed before former and that it still maintains the efficiency of the DOD solution?
If your data inherently has a tree-like structure, it can indeed be quite challenging to speed things up. It's easy enough to ensire AnimationState is computed before AnimationState by sorting the list in a breadth-first fashion, but *refencing* other states might be an unavoidable cache miss.
In OOP you will have pointers and references in DOD you will have IDs. As for the order, unless we are talking multithreaded, it does not matter, it will be always in the order you read your code.
Seems like a flawed way to go about determining of DOD is actually superior. To be a fair example you shouldn't only try to make a more efficient solution in one programming style just cause it was a hypothetical question if you could. It should be that you try to simplify it using OOP as well and put in the same level of effort into that simplification and optimization cause I can tell you this, you'd still get a massive performance gain from a simplified OOP approach, but which one is actually better I can not say. I'm not that experienced with DOD That being said I try to keep my inheritance hierarchy as shallow as possible and my classes as simplistic as possible. If I need something more complex I see if it can be 2 separate classes that operate together. I do use some DOD methodology for certain things but I still find OOP more useful for organizing many instances of something. Some people take OOP way too far and add needless complexity when really their goal could have been done with less. Like there is almost an aggressive need to make super classes with mile long inheritances. very few classes ever truly need that complexity.
So I guess the lesson here is even with object oriented and fully portable data (HTML), you can still manage to make it efficient at the lower levels on a platform specific implementation while keeping the benefits of an abstract upper layer? :P
The main problem for me with data-oriented design is that invariants cannot be properly maintained if a cluster of components basically constitutes an aggregate. For instance, component X cannot be in state Y unless component Z is in state W. You can end up with inconsistent models quite easily. OOP via DDD can enforce this by hiding the state so that the aggregate root could enforce all the invariants in one place. Data-oriented design is most popular in gamedev and visualization (where the speaker comes from), where data integrity is not the main priority (see: all the glitchy games out there). So there's always a tradeoff between performance and data integrity.
This is why the speaker said finding the right separation is the hard part. You should try to avoid separate components being dependent on each other's state in general. If that's the case, you may have a flaw in your choice of separation boundary. According to your argument, when you put everything into a single class, you're implicitly saying that EVERY property is dependent on EVERY other property, which simply isn't true. There is absolutely some boundary where you can slice your large class down into two smaller classes with no inter-dependencies, and that can likely be done many times. Do that, and you have a great starting point for where to draw the line between components in your system.
This started out as a reply and developed into a comment of its own: There's this sort of style of OOP, it gets treated as canonical OOP that a lot of people like the presenter here attack, and regardless of what you feel about "good OOP" it just needs to die already. For the sake of discussion I'll call it the "bunny.hop()" approach. I'll provide a definition later, but you probably already know the kind from toy examples, like where "Bunny implements Animal" and "bunny.hop() tells the bunny to hop". Any teachers out there teaching this approach (and I know they're out there, I learned from them) do no service to students, unless they use it to explain pitfalls in the approach. Dumb starry-eyed examples like these sure help to sell students and middle managers on the idea, but they conveniently gloss over the problems with those designs that make you wish they were never designed like that in the first place. How does the bunny know when to update its height at every timestep? Maybe there's a bunny.update() method too, and it reads a clock somewhere. I'm going to design my clock to have a "clock.read()" method because that looks simple, just like "bunny.hop()" is simple, and maybe I'll call out to that in "bunny.update()". But wait, how do I test bunny? Now I need to create a clock object when all I wanted to test was hop, not to mention all these other dependencies I just realized I needed. What happens when the bunny collides with a ceiling, how do objects communicate with each other to tell when they intersect? I like how simple "bunny.update()" looks so I don't want to add any of those nasty looking parameters, but maybe it should be event driven, or use pub/sub? Oh no! An event was called somewhere in a million lines of code and I have no clue what's going on with this system anymore. I don't mean to say that OOP is bad. There's good OOP and bad OOP, and I've worked with both. I will note though: bad OOP is still regarded as good OOP by those who can't tell the difference, and the people who understand good OOP seldom seem able to succinctly articulate what good OOP is unless they have a specific implementation that they can criticize, even though they have no difficulty explaining what the virtues of good OOP are. The whole design approach lacks that sort of mathematical precision that allows people to automatically agree given some arbitrary thing whether that thing constitutes a mathematical object. There's a huge opportunity to abuse the situation, throwing around "no true scotsman" fallacies to defend your preferences without having to think a whole lot. "Oh, that system doesn't work well with OOP? Well, no *true* OOP would result in a system like that!" As a side note, if I were to attempt a definition for OOP that I regard as "good" (which for ease we'll defined as converse to "bunny.hop()"), I would fall back on category theory: it's a design where mathematical small categories are implemented as classes in which arrows are methods that can be treated as conceptually pure functions. When I say "conceptually pure", I mean a weaker definition of purity that allows things like output reference parameters and owning objects among the input/output to avoid things like performance issues that come from newing up memory footprints, just as long as input and output are separate and the output of a function is determined strictly by things you can read within the invocation. This definition allows for some mathematical treatment, since inheritance and polymorphism simply become a way to implement functors, dependency injection can be implemented by a small category where nodes are classes and arrows are constructors, and the yoneda lemma could probably have something to say about encapsulation and abstraction. This is distinct from what I've seen some others call good OOP, since it doesn't allow for things like methods that read and write object state simultaneously, but that to me doesn't seem like much a draw back. As an example, the "bunny.hop()" method is better phrased as a function of some representation of the bunny's intention, so there's a method hop: ()->intent that creates a new intent struct representing the intent to hop, a (bunny,intent,environment,timestep)->bunny function that returns a new bunny state that reflects changes over time given intent, and a (bunny,intent,environment,timestep)->intent function that returns updated intent, all as methods within a class/category named "Bunny", which itself could be set to extend or implement an "Animal" to handle actors more generically. This isn't the only approach you could take, you could construct a class/category that abstracts it further using function composition, and I'm sure people can construct pathological examples that preserve the definition but are unworkable messes, but as an experiment I'm using the definition to design software where possible to see whether good faith adherence results in unmaintainable code. So far, so good. I will sometimes use namespaces instead of classes to represent categories, namely where I want to minimize the number of includes within a single file, but this has a tradeoff in that I can no longer use class constructors to track dependencies. Regardless, you will note this bears little resemblance to what anyone would do if asked to implement a "Bunny" class.
I am skeptical of the conclusion at 5:41 because he did not prove that the rendering problem of the Chrome browser is caused by the rendering pipeline itself, and his demo only updates the rendering pipeline.
No, it's not. Pure FP data lives on the stack. All of this lives statically on the heap, where it belongs. If you have heap data in FP, then you are not pure and you are just emulating procedural programming in an FP language. Why do you want things on the heap? So that your data structures NEVER move. Ideally you even want to protect certain areas of your state code by using the MMU (e.g. your buffers... now why would you want to do that, huh?). Oh, wait, you are a programmer, you don't know what the MMU is... your operating system has already taken it away from you. :-)
I am rather confused in general... so we started off in computer programming technically doing data-oriented design... due to constraints... eventually down the road we switched to OOP... and now we are switching back to Data-oriented Design... the cyclical cycle of computer programming is rather interesting to me...
Hardware changes over time probably plays a significant role. DoD suddenly becomes interesting for minimizing power consumption on mobile phones, for example.
This was essentially a sales pitch for a programming paradigm, but it didn't really teach much. I have been programming in C++ for about 20 years, over 10 of them professionally, but this speech didn't really teach me anything about how _exactly_ I should perhaps start designing new projects to be more efficient in this manner. "Put all common data in a single array, handle that data with code that minimizes or even removes all branching" is a good idea, but this didn't really teach me at all how that should be done in practice, especially since it just alluded to an overall design without going into any details (most prominently, the speaker just mentions the output arrays and passing them to the next step in the "pipeline", but did not show nor say anything at all about any implementation details, even at a high conceptual level, so I still have no idea what he's talking about.) I suppose that wasn't even the intent of the talk, and it was indeed to be just a quick summary of what DOD is, and an incentive for people to learn more about it. But, in the end, it makes this speech more or less useless because it doesn't really teach anything. (I already knew about the principle cache optimality, and how handling data linearly in an array is more cache-optimal than jumping randomly in memory, and how conditionals can make the code less efficient because of failed branch predictions and code cache misses. In this sense this speech didn't really teach me anything I didn't already know. It also didn't teach anything about how I should change my approach to program design in order to achieve that optimal strategy.)
It means that you did not see any red flags in the complexity that arises when you have to make all these classes interact with each other in complicated ways to make it run the way you intend to. The examples from the chromium source code were pretty good. Some applications are well suited to OOP, for instance a GUI framework. If you are making an animation engine, it makes more sense to think about streams of data going through computation units. As for for their design, please keep in mind that making things simple enough, clear enough, is a challenge and requires effort and time so I would cut the presenter some slack.
I think there's a reason why all these "data oriented design" people are game developers (and, while he promised to show a non-game application, and to some extent succeeded, an animation system is still a pretty "gamey" thing). The reason is that games _do_ end up misdesigned if you try to do them in an object-oriented fashion, largely because there's so many things that interact with so many different systems. Consider: a character on the screen has a mesh, some textures, some animations, some actions, possibly some ai, possibly some player input, some game state (health/mp/ammo/whatever), has to interact with weapons, with its environment, and so on. What's more, all this stuff has to run every frame, so almost none of it is "cold". The vast majority of other applications are _not_ like that. They do not end up mixing so many different concerns in the same objects. When relationships are expressible as graphs, object orientation is a natural fit. The problem is that here they're hypergraphs and untangling that requires introducing intermediate layers of abstraction that, understandably, they don't want to build because they are not performant. I thought Richard Fabian's book, while rather poorly written, sometimes inscrutably so, does a decent job of actually explaining what this paradigm is like and how systems with it are actually designed. It doesn't help that almost every advocate of DoD presents this adversarial relationship with OOP, as if they were incredibly different paradigms. They are not -- this is just a variant of OOP where you design a database that will contain the objects before you design the classes. You still "hide" data in this paradigm, not by making it private, but by making copies/not making it available at all!
48:43 - The guy asking the question very obviously has no idea what OOP is and why it is so bad. OOP is a mindset that _directly causes_ bad engineering. Without OOP dogma, most programmers would just naturally use better engineering practices of organizing the data separately and operating on the data in a manner that the CPU can process efficiently. It's the OOP that makes programmers have bad engineering ideas. The questioner sounds more like he was taking the talk personally, and was offended that someone would ever speak out against OOP.
You need to know how data is used. For example, seperating a point to arrays of x, y, z will only kill performance. I think the whole oop hatred thing is a bit misleading.
+Lttlemoi No is the response to the shit Java which much die and I think that when people say OOP in reality they think Java paradigm and a lot of langage, exotic or not suffer to people want that the langage look like Java. Do you know that Java is not OOP. Smalltalk yes but C++ and Java are inspired by Simula not Smalltalk. People seems to lose if there is no Public, Private, Protected keyword ;-) If you know what I means. 3 lies of OOP against the reality of machine, proc, architecture et cetera : www.slideshare.net/cellperformance/data-oriented-design-and-c 1) Software is a platform 2) Code designed around model of the world 3) Code is more important than data
@@jumpman120 "Code is more important than data" - this is actually one of my top issues with many IT projects/human behaviour, often people put lot more effort into protecting their code (doing things like obfuscation and anti-piracy) while they didn't sort out yet the data backup and didn't document well their data structures. Then in a decade or two they look really surprised when they suddenly have hard time to move to new platform, because their code is now obsolete, and just copying data and write new code is so painful that often even emulation of old platform is cheaper for them. While in reality the data are way more important than code. Code can be written... (especially if you had it written once, you can often pull out the second version with same features in much shorter time)
Because in the OOP use there is a cache miss on each animated square because they all inherit from the same type. So each of the 3k squares create 2 new animation instances. 1 movement animation and one color change animation. In the DOD method he only gets a cache miss from the template initialization so each of the 3k squares make references to the same animation once and only once since all squares are simply structs in a verctor. So it should be 2.
@@Holysoldier000 Still don't understand what cache miss has to do with it. Afaik, a cache miss simply means that an instruction requires data that is not located in the cache right? So we either check what the compiler does or we can make a few assumptions. I don't see how we can tell whether or not there will be a miss or not. I see that the OOP version has issues with vtable lookups, but whether or not we get cache misses depends on were the stuff is allocated.
@@CrazyHorse151 The way he is determining the the cache misses is from multiple inheritances. In the OOP model each instance of an animation will have to defer to a virtual function that it inherited. if you look at the DOD the function reference is now a template and not coupled to the "animation instance" as there are no instances. So if there are 2 types that get unfurled from the template you only have 2 virtual functions. Think of it this way 1 array that runs through 100 objects that call 2 virtual functions is at LEAST 200 virtual functions. 2 virtual functions that run through 2 arrays of 50 objects each is ONLY 2 virtual functions. The idea is to separate the function from the data if that makes sense.
@@Holysoldier000 Ah, I think I just got it. We're talking about the instruction cache, right? So we not only have to check for the virtual function, but we also have to jump to an instructions that most likely is not in the instruction cache, right? So this is were the cache miss comes from, not from the data but the instruction? Btw: Thanks for helping me understand this!
Unique mistake he did is call this nice and cool talk like that. We are not stupid people, guys... seriously! Is the typical sentence when a new king sit in the throne. Let me do the analogy: in Coherence, rendering HTML5, OOP practices like developing apps, or so, simply doesn’t work. Let me say more... in any render engine should be used OOP as you use it for apps, or libs, etc. Every country has its own king ;) Saying that, he should change the title of the talk to don’t create some bad reactions, as Data Oriented Design is... DESING doesn’t even mean you don’t use OOP, if... polymorphism and inheritance only is OOP. We know we pay for that use, but is not the performance problem there, he used templates, composition, not use IF statements in the Tick call, etc, etc, so, is a design, strategy, a way of coding to maximize performance. Anyways, good talk, and great that we have a discussion here about it, is how we enjoy 😉
One of the main principle of OOP is to keep data&behavior together with collection of such objects. Both will lead to no efficient scenario for your computer.
Could you please share the code of the examples ? A simpler case with only c++ may have been more easy to understand, it's difficult to assess if the problems are related to html5 / js / css. While Chromium is object oriented, it is not a synonym for OOP. For example when you say the OOP needs building a complete mock DOM tree, you mean Chromium. Another OOP system could be designed to mock a list of nodes, just like you did. 6x is not even an order of magnitude better. what could we get while optimizing the 'oop' code for animations, espicially optimizing for cache misses?
Title is a bit disorienting. IMHO “Designing your codebase with cache in mind” or “Cache friendly design” or “Know your machine” would have been more appropriate. OOD is tool like any other tool in the language its wellness of usage depends on the context, a thing that also the speaker points out in the end. Besides that, the talk is good and informative. Another downside is that the speaker is moving a lot, I can only imagine what a hard time the cameraman had... :)
I need to watch the video first, but too many code-related videos have stupid camera work where they focus on the presenter instead of the code and other important information: other than identifying the presenter at the start and end, video recording of the presenter is throwing away the most useful information, which is stupid. People go to tech presentations to get technical information, not see someone dance!
I really like the idea of dataoriented and as a student who worries about performance and only leaning oop I am quite annoyed that while these speakers think that what you learn in school is wrong and then they do a really poor job of explaining the thought process in simple terms and examples. I find the examples way too complex, maybe also because I don't program in c++ but still.
lot of this stuff is trivial to grasp if you learn some basics of assembly, and theory of computer architecture, and how precisely data are encoded in computers and processed, after that most of these examples will probably ring the bell even if you don't fully grasp C++ (which takes years of experience to master, so you shouldn't feel bad if you "just try it" and it feels like everything goes wrong and your source looks worse than similar thing written in any other language ... actually it's so bad, that most of the "C++" projects out there in production have rather average or bad source and could be rewritten in much better way by somebody experienced enough ... you should probably do every year some C++ for few weeks and read into some open source projects from time to time, even if you prefer different language for your actual production).
@@ped7g Wow @ped7g you are exactly what is wrong with this field. "Lol its trivial" you guys dont realize how much harm you do to the community with that elitist bs behaviour.
@@RodriTheMighty I think at least it's relatively so. It's more difficult to become a master of C++ without any knowledge of assembly, profilers, and computer architecture as much as people want to market the language as being such. Even if the goal is OOP, it's difficult to even design in OO very effectively absent such knowledge. For example, someone who tries to learn how to design an effective multithreaded software architecture without understanding underlying computer architecture concepts like atomic instructions, CPU caches, and false sharing will probably have an even harder time than if they learned these fundamentals first. "Trivial" might be the wrong word but I think it's at least an easier path, and I think it's at least not as daunting as it sounds. One of the first things I recommend is a profiler and start measuring code. Then learn on the fly with each hotspots we encounter as to why they exist whether it's the result of cache misses, branch misprediction, poor instruction selection by the optimizer, false sharing, or even something much higher-level like an inefficient algorithm with poor algorithmic complexity for the size of the inputs it handles. We might have a problem with some elitism but I think a huge problem is that we are trying to get C++ programmers to run before they learn to walk, trying to use a language as complex and advanced as C++ before they even understand the basics of computer architecture, and trying to optimize their code before they even learn how to use a profiler while misquoting Knuth on premature optimization (Knuth's original paper's point is that all optimization is premature until code has been profiled, and then after it's not premature -- his original proposal where his famous quote was drawn from and incessantly misquoted was to use gotos to generate more optimized branch instructions instead of loops).
@@RodriTheMighty I'm not sure if that's the case here, but the word "trivial" can be used to mean the opposite of "nontrivial". A professional C++ programmer is totally expected to consider the basics of C++ to be trivial. And an oop student isn't really expected to pick up C++ as their first oop language so yeah you get elitism clash here. C++ isn't really a "community" language. It is a language for programmers. Maybe that's wrong, but if you look at the other side, it would be really bad too if there was no talk meant to be by programmers for programmers. You'd never talk about things without having to revisit the basics.
Chrome is a very specific use case, not everybody is doing in-memory management of their objects. Most of programmers out there are using databases not in-memory storage. How do you do this in databases? I guess: you can address each table as a long collection of objects, you can have indexes and composable queries to filter the collection down, and then at the bottom mapping. I think would be something like that more or less...
Databases are actually a good example of data oriented design. You fetch the data you need only, iterate over that data and do your thing, then update the database. That's pretty much data oriented, but with an extra step of communication with your database. Think of it as if you're storing the columns instead of the rows, and if a certain object doesn't need a column, you don't add its id (think primary key) to it.
In the beginning of the talk if you followed and listened carefully, he emphasized on not seeing this type of software design adopted for other projects than game related technologies..
So rather than profiling what is making chromium slower, just attribute it all to cache misses rather than it simply doing more stuff with fewer assumptions than your implementation. Exact quote was "doing pretty much the same thing". When specifically asked this, he admitted this, and even mentioned "simpler call stacks" like that means anything after building C++ code with optimisation. Didn't test it on ARM because cba to build it, but yeah ofc it will be an even bigger improvement there! So you admit your reasoning is not evidence based. Anyone can get 6x improvement easily with lazy measurements. Why not link us to your pull request to chromium then?
This is still a kind of OOP. It's just making different choices about how to divide the objects -- grouping things based on their usage and making duplicate copies of "shared" data where necessary rather than actually sharing it -- vs. grouping things by related logical concept irrespective of usage. "Classic OOP" just works, but it doesn't scale as well to large datasets or complex multithreaded usage. This sort of design is better suited for that, but requires more external infrastructure to ensure that inputs and outputs are correctly propagated between the systems and that all the multiple copies of the data are updated as needed.
This is essentially the same as database design. What he calls "existence-based predication" the database types call the "first normal form". Since writing and interfacing with databases is one of the key things OOP was designed _for_ it seems hard to see these concepts as adversarial.
Since he made this talk, chrome devs wrote a bunch of blog posts about how it's "impossible" to prevent use after frees so they'll just never free anything, and that they can't fix their flaky tests so why bother. Really not impressed with chrome devs right now.
Object-Oriented Programming isn't about inheritance, in fact in a sense by using the templates here there is still OO (object-oriented) behavior being performed, its just also being performed in a DO (data-oriented) manner, DO and OO are not exclusives except in languages like C# and Java where everything is a virtual class with virtual objects attached to the runtime with a required type system. But that's foremost a dumb thing for a language to do if it cares about performance and secondly restricts the programmer to a singular paradigm. In C++ this isn't the case but to act like such is also quite foolish, you can use OO and DO at the same time, inheritance has some useful cases when you don't abuse it, its convenient and scalable and only in some simple composition cases should you consider using inheritance with your OO systems, for DO it can be great for the CPU cache but is terrible if that wasn't the bottleneck of the system which is not always the case, asynchronous code for example will almost never get a benefit out of CPU cache behavior and in fact designing for it is very likely to cause more problems then it solves, (which makes it problematic for something like UI development which relies on async-like behavior that does not benefit from CPU caching) OO specific behavior (disregard inheritance) can solve this in some cases. The biggest point for OOP is that it lays out an interface for use of an object, that means minimally required functionality, if the data within is also readable then others can also still treat it as a DO object and it doesn't even violate DO design. You are not required however for objects to be comprehensive nor to have virtual tables on them, inheritance is not a requirement. (not to mention "static inheritance" removes the virtual tables problem, which is almost exclusively responsible for those cache misses, and it starts to become more data oriented)
Yeah, inheritance is essential to express the base functionality but in a different manner. Its just that whenever internal side effects occurs due to that base function, everything goes to hell... It is essential in some context yes, but very volatile when you introduce mutable state to the base class....
I know this is an old comment, but I think that is a good thing. I've seen a lot of talks about the value of DOD and the evil of OOP, but not a whole lot that goes beyond classic college definitions that no one would actually use in the real world. If there can be more interaction between people that have a lot of experience in OOP and people that have a lot of experience in DOD to compare and contrast the pros and cons of each approach in real life situations, I think more programmers (myself included) would greatly benefit. I've been working in OO my entire professional career and am pretty curious about DOD, but my bar to entry right now is trying to think about what I currently do and what it would look like in DOD. If there are more conversations that bridge that divide, rather than extolling or condemning one or the other, then it would make the transition to using DOD much easier.
The aims of the data oriented design seems to port functionnal paradigm to OOP langage to reduce the problem of state of hell and the lies of Java langage which break the "cooking recipe" approach of C langage which make difficult to understand how the processus state and memory evolve during the execution of the process.
The point of OOP isn’t the greatest efficiency, but to be effective with large teams of mediocre quality of devs. If everyone was a Mozart, they would writing in assembler and get the greatest performance. Long term functional programming requires talent in math. Most devs are incredibly untalented when it comes to anything above high school level. This isn’t because of the defective educational system. Humans are better with stories than abstract thinking.
There is a serious question I have and it is tied to what happens at 5:10. He says "in my hotel room it was 70, but who knows what happens" - isn't this what the problem is with a strict DOD? When you write games, everything, literally, is streamlined. You are not browsing the web at the same time, you are not taking calls on Zoom or Skype etc, the list can be made long here. Browsers today, are like small OS and while optimising that rendering of moving things obviously can gain a lot from DOD, I have my doubts about rendering diverse things like regular websites which isn't just moving objects that do things.
I believe the goal is to be able to integrate DOD for specific things that would greatly benefit from it, when needed. For instance, with Unity right now, you can do this with Unity ECS. Some things would remain OOP, others DOD.
Yes a regular website is not going to benefit from focusing on primitives, only something like a 3d game in the browser would. Data-oriented design is still used in the web world, it just means separating code and data instead of avoiding objects.
+Bruno Xavier Leite When someone talks about Java and SOLID and design pattern. Can I say that it is bullshit and there is so illusion in OOP are not ? Because everybody use OOP and JAVA and public, protected, private, and design pattern and programs are always a ton of shit.
At this point. It's so easy for me to write in an OOP way and test that I don't really care about trying the alternative just because some people think it's better. Anyone proficient with OOP knows what the pitfalls are and tries to avoid them (ex: inheritance)
I think the author is unfair and his belief that DOD wins from any point of view is an exaggeration. DOD is a good approach for optimization but I highly doubt it could be an effective method of development by default. The argument that code is easier if it's in one place is very very old and it reminds me about the time when we were trying to switch our paradigm from "Spaghetti Code". We don't write such code anymore exactly because we learned to divide our code and use proper abstractions, separation and isolation. With DoD we can end up with a completely unmaintainable code. Once I had a chance to work with a particle system where struct Partice had more than 30 fields and a huge chunk of code used it. Fixing bugs was a disaster adding new features was a disaster but it was exactly DoD.
You're falling into the same traps as the video. You can make garbage code in every paradigm. Ignorant people who program by accident and/or cargo cult are always going to end up with bad results. OOP isn't immune to this, or even very good at guaranteeing good results "by default". Taxonomies are entirely viewpoint dependent, and they are subject to change over time. Defining them statically tends to solidify them in your program. Designing all your abstraction in an ad-hoc fashion, before you even have a grasp on the entire program's data flow, will guarantee your code will be hard to follow and hard to maintain. Abstraction-first is absolutely the wrong way, and typically leads to encoding YAGNI-violating abstractions everywhere. Then there's the problem of refactoring them once your requirements (immediately) change... On the other side of the coin, nothing about data-oriented programming precludes you from splitting up your code into logical units and separating unrelated code. Modules and namespaces and functions exist, and are isomorphic to classes and member functions. If you want a "this" pointers, just pass it as the first argument to the function and name the parameter "this" or "self". If you really, really want that "dot" method call syntax, uhhhh.... invent a language that isn't C/C++, or make a macro, lol. It isn't an abstract limitation of the paradigm is my point.
But let's stop attacking stupid strawmen/anecdotes of the worst possible version of each other's paradigm, and get to the root problem instead. Splitting that 30 field struct would almost assuredly be a good idea here. If most fields aren't used for every particle in every iteration, then you don't want to be loading all the unused junk into cache. Neither OOP or data-oriented programming save us from the refactoring pain, either. OOP gives us some tools - moving members around the class hierarchy, various composition/pointer based abstraction patterns, this pointers, but they're of limited use and there's large classes of data structure shuffling that isn't made easy with only those tools. A better tool would be some sort of "struct hierarchy typedef". The "this" pointer + inheritance is kind of like an anonymous-only version of this feature. Something like "using end_color = particle.end_pattern.color;" to alias all the fields of "particle.end_pattern.color.r ...color.g ...color.b" to "end_color.r end_color.g end_color.b". You could also support anonymous (unnamed) access, with just "using particle.end_pattern.color;", and then just use "r" "g" "b" directly, as if you had inherited from color. (anonymous access to r, g, b is not a great idea with this particular example, but the point is to show that it is a generalization of the inheritance+"this" concept). Let it be used in every possible scope - define it inside a module, namespace, struct, class, function, member function, for loop, if statement, anonymous curly-brace block, etc. Then you could arbitrarily compose structures the way you wanted them in memory, and be able to shuffle them to your heart's content, but minimize the number of changes you had to make to existing code in order to do that shuffling. You just add or edit the correctly scoped "using" declarations, and all your code would suddenly compile again. Another very good thing would be unifying "->" and ".", so that you only had to use one or the other and it would work for both fields and pointers. The explicitness is slightly nice, sometimes, but it makes it so all our code breaks whenever we switch between fields and pointers. Array and pointer access is already unified in a similar way, so I think we have some proof that melding like this can have at least some merit. Both of these changes could improve both OOP and data-oriented code.
I don't think so. He actually argues that their solution has less branches. That could also be described as they have produced less "Spaghetti Code". However, I think the talk wrongly presents DOD as an opponent to OOP. According to my understanding of DOD and OOP, it's not. I think the main-point of DOD ist that you should think about the data flow in your application, before building your abstractions. You still get abstractions (maybe using OOP) but they might be different. As shown in the talk, they still have abstractions for different parts of their animation component. However, they decided to not have an abstraction for the Animation itself, because it would not fit nicely. They only provide it as an interface for other components.
I am hearing the presenter making a lot of speculative statements about performance. As a general rule improving performance increases complexity, and as such should only be done when necessary. As a specific rule you should never speculate about performance, only measure it.
It sounds like the comparison is not between DoD and OOP as much as the design patterns commonly used. Otherwise what you call DoD is actually just OOP with data-friendly design pattern
References for easy googling:
"Data-Oriented Design and C++", Mike Acton, CppCon 2014
"Pitfalls of Object Oriented Programming", Tony Albrecht
"Introduction to Data-Oriented Design", Daniel Collin
"Data-Oriented Design", Richard Fabian
"Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)", Noel Llopis
"OOP != classes, but may == DOD", Lane Roathe
"Data Oriented Design Resources", Daniele Bartolini
This talk definitely made a lot more sense to me when I watched it another time after having read "Data-Oriented Design" by Richard Fabian.
@@Tekay37 Some decent content in there but legitimately one of the worst-written books I've ever seen.
@@isodoubIet is there another source/book you'd recommend instead? Started reading it recently after watching Mike Acton's & Andrew Kelley's talks on DOD
Whoever lined the addbreak up for the end of the talk before the questions, well done and thank you. It was wonderful to get through this talk uninterupted.
A sad thing is web people think 13 fps from rendering a few quads is completely normal
In my experience, it's also perfectly acceptable for users to click and wait for things. Having to wait for the computer to process a few hundred structs is a tell tale sign that something is very wrong. Worst case - I took over one project where the running time was 55 seconds for a simple script. After rework, I got it down to 0.012 seconds, most of which was spent actually transferring data between servers. The rewrite took a few hours and was trivial (for me) to do, but saved tons of resources. The earlier solution was to throw hardware at the problem, which is both stupid and expensive. :P
@@rasmadrakWhich Is whats companies do now a days: "hardware is cheaper than labor".
Actually true, Call of Duty mobile runs at a higher FPS than every shopping/taxi/delivery app.
Great to see another data oriented talk at CppCon
Lots of OOP fans are being triggered here, lol. Basically, it boils down to memory access patterns. If you can arrange your computations to simply "flow", best without hiccups, i.e. without branching and chasing pointers all over the place, and if you can put data being accessed by hot code in the same place, so they can be prefetched in the cache, you win big, performance-wise. That is what DOD is all about: to better match program data to the way hardware operates. Now, in cold code, OOP is just fine: it dramatically increases programmer efficiency at the cost of runtime efficiency, a cost we're willing to pay. But in hot code, HPC code, and real-time code, DOD is vastly superior, as it much better matches the problem to the hardware.
I'm a DDD OOP oriented person and couldn't agree more. Even die hard DDD (domain driven design) books mention "while design may abstract models of the world, they should never forget technology being used to implement it".
Even in cold applications you can make a whole lot of damage completely ignoring technological limitations.
Programming for games and with that perspective in mind, I think most low level game programmers agree with an dknow this. Most games use a lot of OOD, but for the most performance critical sections we use more DOD. It does not really have to be all or nothing.
Like one super simple and common example that everyone can understand. Writing particle systems, there is no way one would write them as objects with functions. The data for the particles will definitely be stored separately from the functions and separated to only include what is needed for the code being executed. Ex, not storing visual stuff with things for the simulation step and so on.
And that is how everything can be done. Many parts of the engine can be OO, but the parts that are easy or really hot, can be DO.
Both have their strong points.
It's not like most people that are for DOD tries to make for example a really complex enemy with AI that only runs for 10 instances at a time to remove all the structure and objects. But maybe the line of sight, audio propagation and path finding can actually be DOD without any drawbacks on the ease of use or flexibility of the rst of the logics.
Also, as one more additional note, in most cases if one want to really take advantage of broad SIMD vectorizing, it is really easy to do with tight loops of DOD, and much harder with OOD. I have worked on engines that can use OOD with animation updating and manage different versions of functions to take advantage of different instruction sets, but in the end most of it will be structured in tight arrays at that time anyway, so it will be kind of similar to DOD. And then it's more like arguing about semantics.
I would really be curious if there is anyone arguing for OOD and thinks otherwise here.
I agree with the performance argument. However, those who argue against OOP often use an OOP "example" like in this talk, which is not a representative example of OOP. If a class inherits from 7 other classes instead of *using* instances of these classes, then the programmer failed using OOP. Also, well designed OOP code has fewer if statements. e.g. Instead of checking whether I'm in a crawl, walk, run, or idle animation, I just have an object for each animation that knows how to animate and interpolate itself.
So the example given is a bit of a strawman and I understand that OOP fans are triggered because they rightfully feel misrepresented.
oop is very, very rarely a valid way to write code. most oo code out there is basically people coming up with new models of real world concepts that have nothing to do with the real world to begin with and just translate to you having to understand the original concept AND the model the person to write the code came up with. oop is broken at a fundamental level as a programming paradigm, performance just happens to be another sector where it sucks. oop is only fine when it's not used as a paradigm, rather as just another tool to solve a very narrow class of problems. the *actual* prorgamming paradigms are procedural, functional, etc. that don't spiral down to an incoherent mess.
that said, this conversation is already too high iq for most people. arguing with people who think no oop means no classes, constructors, destructors, polymorphism, encapsulation, namespacing is basically just roleplaying as sisyphus. when you've only been introduced to all these concepts exclusively from within the context of oop, it’s quite natural that you get defensive and piss your pants when people ask the right questions.
The more I research about Data-Oriented Design, the more I believe Object-Oriented projects have just been saved by great hardware. It's just a luxurious way of programming that's completely detached from the actual behavior of the machine.
Your program might be object-oriented but your machine isn't.
that s the whole point. you know like an operating system or a spreadsheet app.
It's not always so luxurious though. One of the conceptual problems I see even if we're trying to code to our intuitions of the real world, and not the hardware, is that OOP doesn't remotely begin to reflect the nuances of the real world.
Objects in the world don't offer functionality based on what they "are". They offer it based on what they "have". A human isn't able to walk because he's a biped, he's able to walk because he has functioning legs (and not all human beings have those). A duck isn't able to fly because it's a bird, it's able to fly because it has wings just like an airplane, which isn't a bird. And unlike an airplane, a duck can also walk, because it has both wings and legs.
And in the real-world, the behaviors and available functionality of things change on the fly. For example, a human able to walk today may not be able to tomorrow, because he got a serious injury and lost his legs. OOP wants to model such interfaces in a static, unchanging way, not ways that can be dynamically changed on the fly.
An ECS doesn't just fix the problem of OO being completely detached from the actual behavior of the machine. It also fixes the problem of OO being completely detached from the actual behavior of the real world. It solves both problems that OO has: the problems for humans who want to program things intuitively that capture the more nuanced complexities of the real world in their simulations of it, and the problems for humans who want to program things intuitively that are harmonious with the nature of the underlying hardware.
Without the profound advances in semi-conductors people wouldn't be enjoying all the python & js paradise they have been for so many years...
Back in the old days of the 80's and 90's it was the standard to drive behavior with data since you had so much little hardware capabilities to work with!
It has been clear to me that with the passage of time, we have been getting lazier just for the sake of convencience.
@@darkengine5931 If you think about it deep enough, memories or information or data that we store in our brains, enables functions (systems) that would otherwise not be active since the data (information or memory) was not present in the composition
@@darkengine5931 OOP isn't based on a physical world model, it's based on english grammar and having verbs act on nouns.
Stoyan: "Study Chromium, it's made by the best engineers in the world, there's a lot to learn!"
Also Stoyan: Shows how the best engineers in the world designed an overengineered system with poor performance.
Both statements are true believe it or not. They made a piece of software that does what it should and does it reasonably well. Speed was a secondary concern and the over engineering was unfortunately required. Try to make your own browser and you'll understand what he means there. There are some incredible performance tricks in chromium, it's not like they don't know how to program, there's just so much to keep track of.
@@nextlifeonearth and I think it is worth noting that most software of that scale has grown over a lot of time, and it's not like all of it was planned out perfectly from the start. And that is just how software engineering works. If the same people started from scratch to make a ne version today a lot would probably change, but a lot would also likely be similar.
Code bases grow not just from wanted features, they grow also from being used, as a lot of legacy stuff needs to continue to work, multiple solutions for the same thing might co exist. And it's not like the engineers want to do that if they had a choice.
It is "easy" in comparison to after the fact just take a part of it and make a new better version that just does that thing and without having to support all the legacy stuff.
Stoyan: "you're overengineering your systems and causing them to have poor performance!"
Also Stoyan: "and here's how we built a browser rendering engine to make a game menu."
@@isodoubIet do you need performance in a games menu?
You should be spending time optimising the tight loops, not the menu
The speaker could have given a much better answer to that second question, which asked about what exactly he thinks should be "dead" in OOP.
OOP was designed to help with program design, maintainability and reusability. Things like encapsulation and abstraction are key core concepts of OOP, and they were developed in order to aid in the design of huge programs. It's a tool to create a million-lines-of-code program in such a manner that it remains manageable, understandable and maintainable. When properly used, it makes code simpler, safer and easier to understand and to develop further. It also helps in making code reusable, meaning that it's relatively easy to take some functionality that could be used in another part of the same program, or even a completely different program, and use it there without much or any modification. This helps reducing code repetition and overall work. It also helps in stability and reducing the number of bugs, because when a piece of code has been extensively tested, you can be certain it won't be a problem when used in another place or program. OOP does a relatively good job at this.
The problem with OOP is that it wasn't designed, at all, with low-level efficiency in mind. Heck, back when OOP was first developed as a paradigm computers didn't even have any caches, pipelines or anything like that. There were no "cache misses" because there were no caches. Memory access speed was pretty much constant no matter how you read or wrote there. The performance penalty from conditionals and branching wasn't such a big concern back then either. It was but decades later that processor architectures went into a direction where the scattering of all the data randomly in memory became an efficiency problem.
Thus, if we want maximum efficiency in terms of cache usage, what needs to "die" in OOP is the concept and instinct of decentralizing the instantiation of the data that has a given role. The data needs to be taken out of classes, and centralized into arrays, and thus we need to break encapsulation and modularity in this manner. We also need to minimize branching, which in terms of OOP means minimizing the use of dynamic binding (in addition to try to minimize conditionals).
WarpRulez Granted, his title was obviously over the top, but his talk was excellent. The main point, at least what I got out of it, was that hardware matters because that’s what programs actually run on. You can’t abstract that away becausw you believe in some universal machine that runs on fairy magic dust.
And when people mention that OOP is best suited for large systems, with millions of lines of code, I don’t know how to react because it’s simply not true, unless you cherry pick examples. No major operating system, each which is easily 10s of millions of lines of code , is written that way for the exact reasons (and more) the speaker mentions: you cannot ignore the hardware. And the closer you are to it, the more you realize it. It’s unbelievable the technological advances we’ve had in computing but my browser operates at the same speed (on the latest hardware) as it did 15 years. Even with c++, how long does it take for QT Creator to load? Or the PyCharm IDE? My OS loads faster! How is this at all possible when we’re continually improving on the hardware end? It is ridiculous.
And if you think OOP is responsible for abstraction, I don’t know what to tell you pal.
Clickbaited. Not an OOp Evangelist but good to see commenters like you shedding like to newbies who would forget taking this with a grain of salt.
I don't think that OOP makes code more maintainable or easy to reason about. OOP is both not optimal for performance AND not optimal for maintainability. I mean, most highly scalable (in terms of codebase size) and maintainable languages can't even do OOP (functional languages like Haskell for example, or Rust which prefers combosition over inheritance). Although, Rust might add inheritance someday (delegation RFC). Even in languages like Javascript people start to move away from OOP not due to performance (in those languages you often even sacrifice performance by not going the OOP route because they are optized for OOP) but because using more functional approach makes their code more maintainable (e.g. React, which now even wants to get rid of components and replace them with hooks and useState).
@@maxschlepzig641 There is tons of object oriented concepts employed in Linux. Just because it's C doesn't mean it has no OO.
@@boltactionpiano7365 only because it uses object oriented concepts, doesn't mean it ignores hardware
you can map objects pretty well to some things and be performance (e.g. directories), but I think a pretty big performance killer in OO is pretty often inheritance (which includes Interfaces as understood by Java)
fantastic talk, the questions were very provocative, the speaker handled them excellent
A breath of fresh air. Every time I try to read the source of an open source project written in C++, I find that, for even the simplest thing, I end up having to hunt down and read a few dozen different member functions in quite a few distinct classes (in distinct files). Object-orientitis I call it. Though I find the arguments here are more against excessive abstraction and splitting things into an excessive number of objects pointed to.
"[OOP and DOD] are just tools in your toolbox" - this is good advice! Unfortunately, many developers treat OOP (and TDD, and...) almost like a religion; not as a tool, but as a rigid set of beliefs that must be adhered to at all times, lest you evoke the anger of The Prophets; Fowler and Uncle Bob, hallowed be their names. But OOP is just a tool. Go explore, have fun, learn new things, and expand your toolbox. You'll see that hammer-oriented carpentry is limited :-)
Yes. There are plenty of times where OOP are great but other times it is overkill or not the fastest method. Use the right method for the problem you are trying to solve instead of Trying to push different methods as a replacement they should teach them as additions to a set of programming methods. That is why I like php and the flexibility it offers. You can go OOP or Procedural you can use functions, methods, classes or modular programming methodologies. I think it would be cool if php-gtk had taken off to create more client side php programs.
I believe GoF's Flyweight is exactly what this talk is about. The part of the problem with OOP lies in how we approach OO design and how we teach it. Beeing a teacher myself, I always encounter students who have this Animal and Dog and Cat style of OO design, so starting with very beginning we have this very naive mindset how to model the world in the software. Like Animation class in Chromium.
My take is to see OO more like system/API level thing, more like modules in Oberon or Service in Spring. Here is where OOP is really shines
i am learning OO too, and i am taught the same way as Animal | Cat, Dog . glad to know that this is not good approach.
but also, i don't understand what is the alternate approach. can u please explain a bit?
@@yash1152 +1
Game engines typically don't struggle with duplication of heavy data though if I understood correctly about the citation of flyweights. That's a trivial problem to solve for commercial gamedevs to avoid duplicating hefty data like textures and meshes and strings and store indices/references/pointers to them instead (interned strings in the case of large strings); there's no need to even call it "flyweight" and generally don't require associative structures or any fancy solutions to do so. The efficiency demands would make it blatantly wasteful to do so. The bigger and more serious problem for critical real-time applications like games with OO is encapsulation and a scalar type of way of thinking about objects. Encapsulation is thoroughly antithetical to efficiency. For example, an OO designer will generally design a BLAS 3-vector like this:
template
class Vector3
{
public:
...
private:
T x{}, y{}, z{};
};
In a system where the critical transformation loops are sequential in nature, this is thoroughly counter-productive from an efficiency standpoint as those loops will benefit most from SIMD processing data in the form XXXXXXX in 256-bit YMM registers, not XYZXYZXY. Interleaving data like this which isn't optimal to be accessed together in an AoS fashion as OO generally wants to do wants to be split apart (hot/cold field splitting) unless all our objects are hefty multi-containers of data. Yet they don't want to be split apart to maintain intuitive encapsulated designs. Also splitting apart these objects later is enormously costly even in response to all profiler hotspots pointing to this as the ultimate bottleneck killing framerates, since it's a design change to a public interface and will break every single thing that uses that design.
Another very serious problem is that object-oriented programming often wants to create abstractions that hide away data treating them as implementation details (ex: an image abstraction which hides away its internal pixel format), and the most critical loops often require the data to be leaked in a leakier abstraction by treating data as a fundamental part of the public interface. They want "data structures" (leaky abstractions with "open data"), not "objects" (sealed abstractions with "closed data"). Efficiency actually demands leaky abstractions, not good abstractions as judged by OO ideals such as SOLID or even the more basic principles of information hiding. Hiding information is the worst possible thing we can do from a performance standpoint, and yet doing so is at the heart of object-oriented design.
@@darkengine5931 OO does not require hiding information, encapsulation is not a strict requirement of OOP, and honestly encapsulation should only ever be used to protect application state from faulty manipulation when you can afford to in the first place, like a Vector3 has no excuse having encapsulation for the internal data, usually this be the case of higher level functionality and interfaces where CPU caching either doesn't determine the performance or in cases where correcting the CPU cache misses wouldn't have corrected the performance issues in the first place, which is usually larger interface points. Not that OOP itself requires that you design in a manner that doesn't respect DO design, that's just the tradition foolishness people are taught early on as being "smart" even though it rarely ever is. Composition should be your first design, inheritance should always be your last.
Use the toaster-analogy!
Bread -> Toaster -> Henry
Bread stores the slices, and the toaster changes the amount of toast in each slice and eventually passes them on to Henry that eats them. ;)
I find it funny that for every one of these talks, there is always someone trying to make a jab along the lines of "if you use data-oriented design, then after ten years of data bloat you have to do a lot of hard work to keep the code running as fast", as if that is a downside of data-oriented design. Well, duh. Of course it's hard work. The same bloat happens in object-oriented code as well, but there it's too difficult to see past all the classes and indirection, so most people just give up and accept the slowdown as "inevitable".
Not to forget the objects and inheritance will all become dependent on eachother. When the purpose of the program changes, not if, the design will need to be changed and unless you start from scratch, these dependencies will be reused with a lot of overhead, often because the programmers only know that if you remove it, it stops compiling somehow.
Data oriented will let you start from scratch right away, but various code paths are reusable with minimal modification as opposed to OOP, where the classes are purpose built.
We gotta think why data oriented design matters at all. It is because the memory speeds are cripplingly slow and we got caches as the hacky solution. If someone invented faster memory nobody would even care about data oriented design.
This is _exactly_ what has happened over the years. Code gets slower every year directly because of the bad ideas from OOP, and nobody knows how to fix it. Data/CPU oriented design is hard only because doing something the right way happens to be hard most of the time. If it's too hard for traditional OOP programmers to do their job correctly _without_ OOP, maybe they can try serving at McDonald's? I, for one, wouldn't mind if OOP programmers all quit and modern software could start competing with software speed of the 70s again.
@@totheknee an observation. The sorts of problems we are solving are several orders of magnitude greater that the problems we were solving in the 1970s. We are doing things now that were inconceivable 50 years ago.
It’s also just funny because it’s mostly an irrelevant concern. Why would I (as a dev gainfully employed at a tech company, let’s imagine) care if something takes work to maintain? The alternative universe where the code works well for eternity without overhauls or maintenance is one where I’m unemployed. Not to mention the fact that the bulk of commercial software revolves around releasing new features and updates and getting people to buy in over and over again. Do people expect that you’ll never need a rewrite in that kind of a market?
Better hide that PUBG picture if you want to talk about high performance...
maybe replace it with a PUBG Mobile picture?
They did hire a lot of people to make the game run better... It runs on mobiles after all.
Compare how it works at start and how it works now it's like a dream... try some Escape From Tarkov and you will see what is bad optimalization and performance.
Sea of Thieves is one of the best performing games I have ever played though
That's nice feeling when I notice, I reinvented DOD just after watching some videos about caches.
I'm now trying to use Data Oriented Design in a business apps where different aspects are moved into the components (in the sense of Game Entity Component Systems). For the reason that it is much easier to synchronize with remote systems and avoid fatal sync failures, lets see how it goes. While it is a nice talk i have not seen Data Oriented Programming outside the Games Industry.
But on the other hand. A normalized inmemory relational database system is doing exactly what an ECS is supposed to do.
I'd like to see the code and cache hit data from the guy who asked the question at 51:05. I'm 99% sure his code is a prime example of what Nikolov is talking about.
Lol I know right. The salt was incredible.
As always, talks like this drew lots of heat from OOP fans in the Q&A. Maybe the title should have been less inflammatory :P.
It's a catchy title for sure :D but it's also kinda misleading, because it's not really a problem of OOP. The problem is the way that OOP is taught: It's often like "structure your class fields the exact same way as your properties/getters/setters", and that's bullsh*t.
Instead you can simply embed the data oriented design *into* a class and then have the best of both worlds: cache locality and encapsulation. That way your class can have a convenient public interface and at the same time hide all the dirty secrets (like the actual data layout in memory) that it uses internally to achieve better cache locality. For me OOP is just a way to separate the internal data layout from the public interface.
It's not OOP *or* DoD, it's OOP *and* DoD :)
In HPC data oriented design is the norm, it comes from the way people used to write programs in FORTRAN where not even structs were available. Scientific codes with a long history have high likelihood of being written by people who care about performance and know about the hardware.
Exactly, it's all about locality, and the same is true for embedded systems which get crazy heterogeneous but have biannual hardware release cycles instead of 4-5 years in HPC. It's like HPC with attention deficite disorder. "Look! A new instruction set! Pack it to the other seven SoC components! Add another on-chip interconnect! Use MPI to offload from Cortex to M4!"
Those idiosyncratic EVMs are perfect for evaluating hardware abstraction, though!
@@tobiasfuchs7016 reason and good sense will prevail in the end, a bit sad it took 20+ years and 15+ years of memory wall to realize and start to popularize certain basic concepts.
HPC = high performance computing, right?
Yep, doing scientific programming since 80-s, it is fun to see how ideas make circle back
I see it very right that author mentioned that OOP is much more applicable under some circumstances than DoD. And I would like to expand this point with a bit of personal thoughts. I'm sorry that I've done a poor job to structure them well. To my mind, It's too early to kill OOP as some comments below propose. Personally, I see the best suite for OOP in a classical enterprise bussiness-oriented programms where dynamic polymorphism is not just that OOP thing we use only because of its existence but which plays a crucial role in building highly maintainable architectures. For example, I mean the Dependency Inversion Principle (D in SOLID) that allows the flow of control and the flow of dependency to run in the opposite directions.
I can't clearly see an application of DoD on business architectures. DoD does demand you that you know your domain very well in advance which is not usually possible. In classical OOP you can separate your business-logic that seldom changes into a separate component and provide it with a plenty of interfaces to decouple it from the details. You can experiment with the details as much as you want leaving the core of your application logic unchanged. And moreover, that inversion of dependency allows you to get rid of even transitive dependecies on the details which can end up in ability to compile and deploy your business logic separately. In DoD you concentrate on the data more than on the behaviour which is not always the right way of designing some systems.
Another significant downside of DoD I see is the lack of context in your data structures. Encapsulation does a good thing in a matter that you give others programmers not only the info about the data you but also the hint into how your data is usually used. Moreover, in classic OOP your are allowed to add some restrictions on the internal data of the object usage. Imagine directly accessing and modifying the std::vector's raw data pointers. Of course, obsessive encapsulation can lead to bloating your objects with a bunch of methods whose logic belongs to different parts of the system. But that is an obvious violation of Single Responsibility and Interface Segregation Principles. In that case, weakening the restrictions of data accessing of the object and moving the odd logic to the corresponding subsystem would be applicable.
That leads to my final point. Residing in the middle of those design paradigms usually is the best practice. I'd support my point with a personal example. Recently I had an opportunity to apply DoD on a game I'm currently working on as a pet-project to organize my gameobjects. However, I really liked the ECS pattern, I didn't want to restrict myself with putting the logic only in the systems. That's why I added all the necessary virtual methods to components. That's allowed me to use an Entity-Component and Entity-Component-System patterns together. And now some components that better know what they need and how to act, like Player's, Enemy's components, have all the logic packed with them while other components that are a part of some more complex systems, like Physics' colliders and rigidBodies, just hold the data. To my mind, that's taking the best from two worlds which gives me a plenty of flexibility without the restrictions of the particular design philosophy. Even though such components technically don't differ as they both have virtual functions that are called every tick, I can separate them into different classes that can be stored in different arrays and, moreover, introduce some custom memory allocators to imrove data locality and reduce cache misses. As you can see, there is a plenty of optimizations that can be added on demand.
When common sense gets you hidden treasures. Great talk.
Holy crap, I did data-oriented design in 2008, and I had no idea until today that this was a thing. I just felt like an outsider coding in a way that I thought made much more sense an performed better than all my peers.
The title itself is a bit of "click baiting", besides that it's a very good talk over all.
Hardware Efficiency Oriented, is what programming should be IMO. It's the extra code and data added by the compilers, that will never be used, the problem is: bloats RAM ♈ and Cache$, and that happens with all programs, DLL and other executables; instead of a well defined Execution Requirement List, that can be better than a simple Import List. It'd be not only much faster, but much smaller, saving also Operating System Admin time 🎉
Really great talk. Well understandable.
Hah all the OOP fans are triggered. What he argues against is the philosophy of using large, virtual inheritance trees for every problem -- especially low-level ones. Great talk. People need to chill about the title and the comments here are too harsh
No, his 'clickbaiting' title is one of the real reason he is 'wrong'. clickbaiters should be sent to gulag forever.
This is a really good talk. The folks saying he's not fair to OOP have just not seen that the juice is worth the squeeze. We have all been lied to. OOP is not the only tool, and the others happen to be better and simpler. I've been using a procedural and data oriented approach since about 2011 and it has never let me down. It has gained performance, added readability, made the code more composable, and made it easier to test. OOP is an unfortunate detour on the way to enlightenment.
There is always that one guy who does not get it. Lucky me, its not me today!
it's me :(
Using performance and react for the game UI in one sentence is a crime
this is god tier
I guess a lot of it may simply be called "hot / cold split", both in terms of members within a class but also with what are considered "active / inactive" objects of the same type being split in different containers so inactive ones are not getting updates / callbacks
I think that lacks a widely-used name (unless you get very specific, like frustum culling which does that to avoid overdrawing meshes outside of a the viewing frustum) but it's generally not something that is so costly to do in hindsight as needed in response to hospots. Hot/cold field splitting is a fundamental design consideration upfront, since it affects the entire way we design data types and data structures. If we try to do it later rather than sooner, it can be an extremely costly design change.
Great talk. Thanks Stoyan!
Great talk, but something is bugging me. The talk is about performance and it that aspect he's missing something. It's about performance... of games written in an interpreted language run in a tab of a program that has to do 100 other things as well. This notion of the browser being the environment is which everything has to run is what makes modern computers feel as slow as the ones we had 30 years ago. If you want high performance animated games, why are you even considering javascript/css to be an option?
Maybe because you want to?
Fantastic talk - I just wish there were more resources to learn DoD. As far as I can tell there isn't even a single cohesive book about it (yet)
Seriously? These concepts have been around for decades. The problem is not a lack of books about this new fad called data oriented design, the problem is the amount of garbage (IMHO) around about OOP. I'm referring to resources that preach terrible (IMHO) design practices. And even if they had their place, remember: horses for courses. Different jobs demand different tools. When was the last time you saw a performance or optimization chapter in a book about OOP? The thing is, what he wrote is OOP. Hence why the questions. How?
Let's say you have a big system and you're designing one of the data crunching (heavy lifting) components. Here, we have an animation engine. It could be an SMT solver, whatever. The OOP way is to publish an interface and hide the implementation. Like you have here. It's nobody's business how you internally manage animation's state (if you don't care about performance, it might be all pretty, like a naive example from a book; if you're dead serious about performance, it can get ugly fast - all the more reason to hide it). Pre-OOP way would be to expose everything (nowhere to hide). Your state would be just laying around for everybody to see. Meaning, they can write code that's dependent on your implementation. That's terrible. OOP gives you this "black box" approach, but it says nothing about what should be a black box. Or how far you should go in your modeling of a problem domain. Containers are a good example of OOP. Or do you think it would be better to expose the internals? Just because a lot of people write terrible OOP code doesn't mean OOP is bad. Just like with C; a lot of bad C code floating around doesn't make C bad. In another words, if the code is terrible, it's your (author's) fault. If it performs terrible, the same applies. Tools don't make a good programmer. A good programmer can make any tool shine.
It's a well understood fact that the more you care about performance, the less freedom you have. You do what you have to, not what would be neat. Take data structures. There are a lot of interesting structures about. However, if you really go for performance, you'll probably find yourself using just a handful of general purpose structures. And you might have to implement even those because the standard library doesn't really offer them.
As far as his critique, I'm not really sure about all of his examples. I'm not familiar with the codebase, I'll just go with what he said. Take the if (!timeline_). They're iterating over animations needing an update. So, this is either a sanity check (which means it should always fail, unless there is an error somewhere), or it's a part of the removal process when animation stops (which means it should succeed once, at the end). I have to ask myself, is this really a significant source of misprediction? Or the if (!content_). I believe he said it contains the definition of an animation. How can you have an animation without any definition? I would expect this to always succeed. Again, is this a real problem or are they just covering their bases (to avoid dereferencing a null pointer in case there is an error somewhere)? Just two examples that stuck in my mind.
Just to add, tight coupling is not a feature of OOP. It's a fact of life that components in a system often need to cooperate. There are quite a few ways to do so. And the convenient way is not necessarily the best way, unsurprisingly. Keeping a pointer and calling directly (well, more likely indirectly ;-) ) with the data you "have at hand" is convenient compared to storing all that information so it can be delt with later (whether by you or by someone else). If you care about keeping your data and instructions hot, your hand is forced. It's not too dissimilar from what humans do - getting in the zone, doing tasks in batches. Or you might want to quickly shoot a message to another thread on another core to keep it fed and happily crunching away while you do your business (you might look at this as a production line with people working in parallel on different tasks, passing work pieces around). This is not about OOP. It's about minding performance.
Interface design is in general challenging. And one thing OOP does is increasing the number of interfaces. If you want to reap the benefits of OOP, you need good interfaces inside your code. Good interface facilitates reusability. Interface can tie your hands when it comes to implementation. Etc.
Oh this gon' be good. Already your fan just because of the title
*gets-popcorn*
Why is it not called cache-oriented programming?
Origami Bulldoser That's a good point - this was more about how to design code to operate to the particulars of CPU caches. To the extent this is advantageous to do it is kind of indicative that modern CPU architecture is not be well served by our general purpose programming languages. Seems like what our programming languages do and what is most optimal for CPUs just becomes a wider and wider gulf
It is like database design. Large disk latency means you need to make things fetch in the next needed data as sequentially as possible. Moore's law hasn't kept up for memory latency and it is more and more like disk over time (though the latency is pretty constant except in NUMA).
Because not every machine has a cache, you use the same design scheme as you would have when programming an Atari, data oriented design, all programming, is about transforming a set of data into another set of data, and it says that's what we should be focusing on instead of pretending we can plan how a system should be structured best before actually testing it, in oop you preplan the structure you have, dod you test the simplest implementation and let the system tell you how best it wants its data laid out, it says if the first thing you need to do is add 2 + 3 it says dont pre suppose the problem by creating a class that handles any possible case for the left and any possible case for the right, you just add 2 + 3 and then see how your system handles it and you move on to the next problem you need to solve being as efficient as you can be by just solving the problem in front of you, letting the data you get back from your system guide you in your design. It's much more than the cache, it's very reactionary instead of putting your ideas of how a system should be made, at the forefront, its constant testing letting the machine tell you what it's like best. You're thinking about what the gpu likes how long things take from disk you use that on how to structure and layout your data.
@@mycollegeshirt DOD is easier to maintain and and less buggy and easier to reason about in many circumstances its not just about performance.
@@captlazerhawk absolutely, but the comment said cache-oriented programming and I wanted to address why it wouldn't be that, and how the data takes center stage even when performance is your main concern. It's just harder to see the difference between OOP and DOD when you start arguing about maintainability to someone coming from an OOP background. To me DOD is about succinct precision and efficiency, best exemplified in it's performance rewards. While OOP is about power, scale, that take more and more power, and have codebases that balloon out of control.
Is a good one-sentence summary “Architect your code to avoid branching and pointer chasing in your hot loops”?
good but incomplete. That's a good fit about code, but alongside you should also "architect your data to be read and written in consecutive blocks of memory (as much as reasonably possible)". You can do your code summary even with OOP design of data (all data belonging to object encapsulated in the same class/struct), but that will still hurt performance if you use thousands of such objects and you don't rearrange the data per their use pattern.
It really isn't, but it's a good rule of thumb to just generally improve performance. I recommend you read Richard Fabian's book on DoD, (available free online).
Mike Acton changed the industry
No. I guess it was due to the people hungry for pretty games.
@Atilla Agreed, programmers just didn't realize how memory could affects performance.
@@younghsiang2509 Oh they did. Cache locality is a reoccurring topic in C++ conferences, like in "std::vector vs std::map". That Mike Acton talk is way too overrated by some. I didn't learn anything new from that one.
@@Gawagu If you "didn't learn anything new" from the Mike Acton talk, you either know more than Scott Meyers or you didn't pay attention. Apologies if the former is the case, and I'd be very interested to see the L2 cache miss rates on your projects.
@@minRef My L2 cache miss rate is pretty low in the relevant parts :D Is there a transcript or summary of his talk somewhere? When I hear his name I also always think of Casey Muratori and Jonathan Blow who have many good ideas and are often right but are also extremely arrogant and often full of bullshit with their irrational hate of C++ and such.
My point was that cache locality and "data oriented design" isn't that new and a common topic on C++ talks even before Mike Acton made his talk.
I found his slides I think: macton.smugmug.com/Other/2008-07-15-by-Eye-Fi/n-xmKDH/i-BrHWXdJ
Most of that is a typical "well depends". His critique only applies to really performance critical code and even there only under specific circumstances. In general I often have to balance between getting something done with suboptimal code and only optimizing actual critical parts. Also their critique of references is pretty stupid. After going through his slides I really would like to tear some of his code apart to point out "how many bad design choices" he can do in a few lines and point out how "typical C" that is.
People defending OOP sound like dogmatists who do not accept ANY criticism of OOP. For them its always the fault of practitioners and not the idea itself.
Good talk, but obligatory warning people about the fundamental flaw of comparing a custom in house animation software for a fairly narrow use case (in-game UIs) to a full fledged web browser. Simply using DoD over OOP isn't going to increase your performance by 4 times.
Their in house browser is most likely heavily hardware accelerated while chrome's render isn't, unless using WebGL. Further more the browser they are running is executing trusted code, so there is less concerns about security, non-standard extensions, and more.
Even 1/2 those results are significant
I think this new engine is 6x faster than the old in-house engine, not the chrome browser default engine . Using the same Algorithm with DOD concepts in mind
Am I the only one who thinks it's slightly hilarious to make a talk about performance while running an entire browser engine to do game UI?
@@isodoubIet Probably.
webgl is pretty quick. So is code compiled to JS to run in a browser. And today one would compile to WASM for even more speed. Sure that may not be as fast as compiling to native code but it can be quite enough.
As far as I can tell the point is one does not use an entire browser engine to do a game UI just the useful parts of it.
DOD seems to address low cohesion problems. i.e. bad OOD
Pretty much. Whats funny s that Object Oriented and "Data Oriented" aren't even conflicting ideas. OOP/OOD doesn't mean virtual functions and massive inheritance hierarchies. It's gonna be funny in 10 years time when people take DOD to its extreme and we end up back where we started but this time DOD is the bad guy.
DOD still has some great ideas though. Hopefully we can integrate these ideas without repeating the 90s.
@@jonathan_cline They are conflicting ideas. OOP causes every field on the same cacheline as the one your accessing to get pulled into memory.
This coupled with calling different member functions (even slower if they're virtual functions since more memory needs to be pulled in to check which virtual to call) means even the instructions cache is constantly getting tossed out.
The CPU ends up spending more time waiting for memory than actually doing work.
DOD says to use a single function on a linear array ONLY CONTAINING DATA that the function actual uses.
All modern CPUs are great at prefetching memory when accessed in a predictable pattern.
So while processing the data on the 1st cachelines, the next cacheline will be prefetched and available for work as soon as the CPU is done with the first, leading to no wasted cycles.
OOP says to put the data you need in an object to fulfill what the object represents. This leads to wasted cycles as unneeded memory gets pulled into memory anytime any field is used.
Actually Data-Oriented-Design is just plan old Procedural Programming. But I guess we need to invent another description for it because OO developers had thrashed talked procedural programming for so long.
The major point data oriented design preachers make is that giant classes with many logic and unrelated data cause slowing. And that's true, but the solution is already in OOP - its literally letter S in solid principles - small single responsibility clases. No need to call it new programming paradigm, its already in OOP
If you use the SOLID principle for OOP, you can make minor changes without being familiar with the entire system. This is especially useful when you are trying to modify a system developed a few years ago.
Combine SOLID principles with cache line aware data packing to avoid cache misses, is a way.
Unless you overdo SOLID and then you have a FizzBuzz solution that has one class for calculating text for multiples of 3, one for multiples of 5 and one that multiple inherits from those two.
28:25 Can you move animations from active to inactive array fast? Would not you need to move the entire array's memory block to keep it sequential?
The sequence of updating the animations is probably irrelevant, so you can swap the to be deleted element with the last one and pop it from the back, then push back to the end of the inactive array. Also you would organize the data in such a way that you only decide to move around between active/inactive once per frame and would try to do that update for all elements of one animation type.
Still the same array and memory block, skipping one animation is still much faster than fetching a list of characters with name, inventory, textures, meshes and animations just to use two of these to update the animation.
In a remove/enplace/update action these can also be reordered, so you'd get their miss only one frame and the misses don't build up over time.
I assume you can just run thru the arrays and toss every entry on the respective array before doing the processing, with non temporal writes you bypassing the cache and the cpu itself will figure out when it should do the writes.
So as long the objects are not huge it should be pretty fast.
I agreed with this talk, but using type erased base classes with virtual methods does not require allocation, and the vtable lookup overhead can be overshadowed by other latency in some cases, making it actually a viable solution even in performance-critical applications.
It is also important to consider that if it runs in a tight loop, the vtables are likely to be hot in modern systems with pretty large caches. Unless you have a ton of various of the functions... But it's not like I'm claiming it is always OK. Just saying that there are cases where the overhead of vtables truly is neglectable.
And a lot of ways to go around the vtables and get the same extensibility has various other overheads.
Also, it is possible to keep lists sorted on type. Ensuring that vtables stay as hot as possible.
Interesting talk on optimization. However, this stuff really isn't new. System level programmers/engineers had been obsessed with cpu/memory/cache/compiler friendly code design way before OOP existed.
Also, don't get baited by the title of the video. DOD and OOP are coexistent. OOP for production speed and code management, and DOD(along with other performance-oriented design patterns) for performance.
@@CreepyBio I'm no coding guru, I hope to be one day. But what I gathered from his talk is you want to lay out your entities by components to be used for systems. And you want those components to be laid out sequentially in memory. And the system using those components should be what defines what data is in the components. Is this a reasonable understanding as to what the presenter is speaking of?
@@skittles970 Didn't finish the full video, but my guess is that you're correct. The basic idea is to group data by "category" rather than by "object", since data of the same type are far more likely to be acccessed sequentially and this is beneficial for the nature of memory units' locality and prefetching.
Nate and that is exactly opposite to what oop says we do
browser engine to render game ui is insane to me
Every year people predict oop will soon die. OOP is like a cat with nine lives
It's true. like C++ does.
It's what's being taught in school, it will stay alive until it's out of the school system
Great talk congratulations.
Excellent talk!
I'd like to hear a talk about the deficiencies of OOP that doesn't blame OOP for the fact that the code is poorly written. Why do you have that hideous inheritance hierarchy and why do you have "a lot of unrelated data" in your objects?
Can't remember his last name, but his first name was Sean. He said, and I agree, that the problem isn't with objects, object programming is fine. The problem is the oriented, forcing the whole program to be built by objects and adhering dogmatically to the oop formula. Use the best tool for what you're trying to solve. I'll use oop when I want to create abstractions, but not everything needs to be abstracted. I sometimes use destructors cause I want to manage the lifetime of something like a tcp-connection. But I'll skip everything else oop.
@@Bozemoto Yeah and no professional software engineer with any sort of real world experience does that. Only academics who haven't written a line of production code in their life of juniors who are still learning to write code.
@@sacredgeometry Try finding the lecture on boosts http stuff. There seems to be a large range of quality in programmers on the market, across all ages and experience levels.
I don't really understand how the animation method that he shows works because to do animation in something like CSS you need to access some sort of variant, that means that you can only decide the type on runtime, but the template are compile-time generated types so you will end up needing some kind of branching like what is done with virtual classes....
There is a collection for each type of animation. And you run all collections which have atleast 1 active animation in them. Basically you have 1 branch for each animation type not for each animation.
Very often you encounter a case where the superset of all fields (data members) required of all the classes in a hierarchy aren't that many. In those cases, using a single class/struct with a superset of fields and a function pointer for polymorphism can dramatically outperform inheritance-based based solutions, since you can easily allocate and deallocate the objects in constant-time with a contiguous or mostly contiguous representation while eliminating the bulk of branch mispredictions and cache misses (you can store them all in one array, so to speak). It can require more memory with some wasted fields/data members, but fixing the size of each object/instance can more than make up for it with more better and more predictable performance (predictable/consistent frame rates are as important in games as fast frame rates).
In other cases when the number of subclasses required aren't numerous or you have a generic solution and you do not need to access the objects in any specific order, you can do as mentioned above where you store each data type in its own container and avoid branching on a per-object level as with the case of using polymorphic base pointers. The branching is then reduced to once per type/container.
I'd have to disagree with the statement that "OOP is Dead". I've written code in both a procedural (struct) style as well as an OOP style and I can see pros and cons to both. Procedural style can produce faster code but you will find that as the project gets beyond a certain size it simply gets too difficult to keep track of all the concepts as they naturally start to get entangled with each other. This has happened to me first hand. So I guess if performance is critical and your solution can be kept fairly concise then a DoD approach may be best. For larger projects with more developers where performance is less critical OOP has to be the winner. Also, I agree that OOP solutions can be overly complex too ... many OOP developers (or more often software scientists these days) see creating the object model as an intellectual exercise without thinking about the CPU at all ... their object model may be a wonderful creation to them but can be tough to understand to someone coming to it afresh.
A better title for this talk would be "how picking good abstractions improves performance". Nothing died here.
I agree, but I'm sure the title was facetious on purpose to attract a bigger audience. Lots of people do that.
aka click bait, like: "Go To Statement Considered Harmful," "Inheritance is the Base Class of Evil."
First of all, it's a good talk. And...I found a very interesting phenomenon of the software industry, re-inverting new words. From my point of view, data-driven is still OOP but without encapsulating the behaviors. I don't believe encapsulating behaviors to a class is NOT required by OOP design. It's all about ABSTRACTIONS. Inventing new words sometimes is not really helpful for teaching new comers of programming. I remember there are many talks in recent years of CppCon that we should make C++ or programming as general more friendly to the new comers.
@@PixelPulse168 You are right that you can implement data-driven using OOP paradigms from the language, but data-driven can also be implemented without OOP paradimgs. It's really something different, even if you could argue it's not in opposition with OOP.
@@Spiderboydk Title could be a response to a talk given by David West "OOP is Dead! Long Live OODD!"
Surprisingly, he didn't mention anything about cache line 64 byte like packed data to avoid cache misses, any thoughts?
It's not all black and white.... Some things are better done with OO, some other things are definitely better done without.
In the general case, if two approaches are incompatible with one another then a third approach can be constructed that can adopt either of the two dependent on circumstances. If the two approaches each have advantages in certain situations, then the third approach will be superior if it can articulate what those situations are. So well done in recognizing the first part, but what to you determines whether an approach should be used?
May I ask a question here?
How do you handle data dependency between different objects that form a hierarchy? For example, DOM is a hierarchy, regardless if we use OOP or DOD, and it imposes dependencies between different objects. Looking at slide 30 (27:33) - if `AnimationState` depends on `AnimationState` of its parent, how do you ensure that the latter is computed before former and that it still maintains the efficiency of the DOD solution?
If your data inherently has a tree-like structure, it can indeed be quite challenging to speed things up.
It's easy enough to ensire AnimationState is computed before AnimationState by sorting the list in a breadth-first fashion, but *refencing* other states might be an unavoidable cache miss.
In OOP you will have pointers and references in DOD you will have IDs. As for the order, unless we are talking multithreaded, it does not matter, it will be always in the order you read your code.
Seems like a flawed way to go about determining of DOD is actually superior. To be a fair example you shouldn't only try to make a more efficient solution in one programming style just cause it was a hypothetical question if you could. It should be that you try to simplify it using OOP as well and put in the same level of effort into that simplification and optimization cause I can tell you this, you'd still get a massive performance gain from a simplified OOP approach, but which one is actually better I can not say. I'm not that experienced with DOD
That being said I try to keep my inheritance hierarchy as shallow as possible and my classes as simplistic as possible. If I need something more complex I see if it can be 2 separate classes that operate together. I do use some DOD methodology for certain things but I still find OOP more useful for organizing many instances of something.
Some people take OOP way too far and add needless complexity when really their goal could have been done with less. Like there is almost an aggressive need to make super classes with mile long inheritances. very few classes ever truly need that complexity.
So I guess the lesson here is even with object oriented and fully portable data (HTML), you can still manage to make it efficient at the lower levels on a platform specific implementation while keeping the benefits of an abstract upper layer? :P
The main problem for me with data-oriented design is that invariants cannot be properly maintained if a cluster of components basically constitutes an aggregate. For instance, component X cannot be in state Y unless component Z is in state W. You can end up with inconsistent models quite easily. OOP via DDD can enforce this by hiding the state so that the aggregate root could enforce all the invariants in one place. Data-oriented design is most popular in gamedev and visualization (where the speaker comes from), where data integrity is not the main priority (see: all the glitchy games out there). So there's always a tradeoff between performance and data integrity.
If you stop thinking in terms of objects with state, you no longer have an integrity problem.
What's is DDD?
This is why the speaker said finding the right separation is the hard part. You should try to avoid separate components being dependent on each other's state in general. If that's the case, you may have a flaw in your choice of separation boundary. According to your argument, when you put everything into a single class, you're implicitly saying that EVERY property is dependent on EVERY other property, which simply isn't true. There is absolutely some boundary where you can slice your large class down into two smaller classes with no inter-dependencies, and that can likely be done many times. Do that, and you have a great starting point for where to draw the line between components in your system.
Great video!
Thanks!
Would like to see the full working code of both designs.
github.com/chromium/chromium it's open. The hummingbird is closed I think coherent-labs.com/hummingbird/
This started out as a reply and developed into a comment of its own:
There's this sort of style of OOP, it gets treated as canonical OOP that a lot of people like the presenter here attack, and regardless of what you feel about "good OOP" it just needs to die already. For the sake of discussion I'll call it the "bunny.hop()" approach. I'll provide a definition later, but you probably already know the kind from toy examples, like where "Bunny implements Animal" and "bunny.hop() tells the bunny to hop". Any teachers out there teaching this approach (and I know they're out there, I learned from them) do no service to students, unless they use it to explain pitfalls in the approach. Dumb starry-eyed examples like these sure help to sell students and middle managers on the idea, but they conveniently gloss over the problems with those designs that make you wish they were never designed like that in the first place. How does the bunny know when to update its height at every timestep? Maybe there's a bunny.update() method too, and it reads a clock somewhere. I'm going to design my clock to have a "clock.read()" method because that looks simple, just like "bunny.hop()" is simple, and maybe I'll call out to that in "bunny.update()". But wait, how do I test bunny? Now I need to create a clock object when all I wanted to test was hop, not to mention all these other dependencies I just realized I needed. What happens when the bunny collides with a ceiling, how do objects communicate with each other to tell when they intersect? I like how simple "bunny.update()" looks so I don't want to add any of those nasty looking parameters, but maybe it should be event driven, or use pub/sub? Oh no! An event was called somewhere in a million lines of code and I have no clue what's going on with this system anymore.
I don't mean to say that OOP is bad. There's good OOP and bad OOP, and I've worked with both. I will note though: bad OOP is still regarded as good OOP by those who can't tell the difference, and the people who understand good OOP seldom seem able to succinctly articulate what good OOP is unless they have a specific implementation that they can criticize, even though they have no difficulty explaining what the virtues of good OOP are. The whole design approach lacks that sort of mathematical precision that allows people to automatically agree given some arbitrary thing whether that thing constitutes a mathematical object. There's a huge opportunity to abuse the situation, throwing around "no true scotsman" fallacies to defend your preferences without having to think a whole lot. "Oh, that system doesn't work well with OOP? Well, no *true* OOP would result in a system like that!"
As a side note, if I were to attempt a definition for OOP that I regard as "good" (which for ease we'll defined as converse to "bunny.hop()"), I would fall back on category theory: it's a design where mathematical small categories are implemented as classes in which arrows are methods that can be treated as conceptually pure functions. When I say "conceptually pure", I mean a weaker definition of purity that allows things like output reference parameters and owning objects among the input/output to avoid things like performance issues that come from newing up memory footprints, just as long as input and output are separate and the output of a function is determined strictly by things you can read within the invocation. This definition allows for some mathematical treatment, since inheritance and polymorphism simply become a way to implement functors, dependency injection can be implemented by a small category where nodes are classes and arrows are constructors, and the yoneda lemma could probably have something to say about encapsulation and abstraction. This is distinct from what I've seen some others call good OOP, since it doesn't allow for things like methods that read and write object state simultaneously, but that to me doesn't seem like much a draw back. As an example, the "bunny.hop()" method is better phrased as a function of some representation of the bunny's intention, so there's a method hop: ()->intent that creates a new intent struct representing the intent to hop, a (bunny,intent,environment,timestep)->bunny function that returns a new bunny state that reflects changes over time given intent, and a (bunny,intent,environment,timestep)->intent function that returns updated intent, all as methods within a class/category named "Bunny", which itself could be set to extend or implement an "Animal" to handle actors more generically. This isn't the only approach you could take, you could construct a class/category that abstracts it further using function composition, and I'm sure people can construct pathological examples that preserve the definition but are unworkable messes, but as an experiment I'm using the definition to design software where possible to see whether good faith adherence results in unmaintainable code. So far, so good. I will sometimes use namespaces instead of classes to represent categories, namely where I want to minimize the number of includes within a single file, but this has a tradeoff in that I can no longer use class constructors to track dependencies. Regardless, you will note this bears little resemblance to what anyone would do if asked to implement a "Bunny" class.
I am skeptical of the conclusion at 5:41 because he did not prove that the rendering problem of the Chrome browser is caused by the rendering pipeline itself, and his demo only updates the rendering pipeline.
It is just like organizing the data in tables of an in-memory database.
Well, it is also like functional programming in some sense.
No, it's not. Pure FP data lives on the stack. All of this lives statically on the heap, where it belongs. If you have heap data in FP, then you are not pure and you are just emulating procedural programming in an FP language. Why do you want things on the heap? So that your data structures NEVER move. Ideally you even want to protect certain areas of your state code by using the MMU (e.g. your buffers... now why would you want to do that, huh?). Oh, wait, you are a programmer, you don't know what the MMU is... your operating system has already taken it away from you. :-)
I am rather confused in general... so we started off in computer programming technically doing data-oriented design... due to constraints... eventually down the road we switched to OOP... and now we are switching back to Data-oriented Design... the cyclical cycle of computer programming is rather interesting to me...
Hardware changes over time probably plays a significant role. DoD suddenly becomes interesting for minimizing power consumption on mobile phones, for example.
Structured programming does not imply DOD , if that's where you are going
This was essentially a sales pitch for a programming paradigm, but it didn't really teach much. I have been programming in C++ for about 20 years, over 10 of them professionally, but this speech didn't really teach me anything about how _exactly_ I should perhaps start designing new projects to be more efficient in this manner. "Put all common data in a single array, handle that data with code that minimizes or even removes all branching" is a good idea, but this didn't really teach me at all how that should be done in practice, especially since it just alluded to an overall design without going into any details (most prominently, the speaker just mentions the output arrays and passing them to the next step in the "pipeline", but did not show nor say anything at all about any implementation details, even at a high conceptual level, so I still have no idea what he's talking about.)
I suppose that wasn't even the intent of the talk, and it was indeed to be just a quick summary of what DOD is, and an incentive for people to learn more about it. But, in the end, it makes this speech more or less useless because it doesn't really teach anything. (I already knew about the principle cache optimality, and how handling data linearly in an array is more cache-optimal than jumping randomly in memory, and how conditionals can make the code less efficient because of failed branch predictions and code cache misses. In this sense this speech didn't really teach me anything I didn't already know. It also didn't teach anything about how I should change my approach to program design in order to achieve that optimal strategy.)
It means that you did not see any red flags in the complexity that arises when you have to make all these classes interact with each other in complicated ways to make it run the way you intend to. The examples from the chromium source code were pretty good. Some applications are well suited to OOP, for instance a GUI framework. If you are making an animation engine, it makes more sense to think about streams of data going through computation units.
As for for their design, please keep in mind that making things simple enough, clear enough, is a challenge and requires effort and time so I would cut the presenter some slack.
This was a show and tell.
I think there's a reason why all these "data oriented design" people are game developers (and, while he promised to show a non-game application, and to some extent succeeded, an animation system is still a pretty "gamey" thing). The reason is that games _do_ end up misdesigned if you try to do them in an object-oriented fashion, largely because there's so many things that interact with so many different systems. Consider: a character on the screen has a mesh, some textures, some animations, some actions, possibly some ai, possibly some player input, some game state (health/mp/ammo/whatever), has to interact with weapons, with its environment, and so on. What's more, all this stuff has to run every frame, so almost none of it is "cold".
The vast majority of other applications are _not_ like that. They do not end up mixing so many different concerns in the same objects. When relationships are expressible as graphs, object orientation is a natural fit. The problem is that here they're hypergraphs and untangling that requires introducing intermediate layers of abstraction that, understandably, they don't want to build because they are not performant.
I thought Richard Fabian's book, while rather poorly written, sometimes inscrutably so, does a decent job of actually explaining what this paradigm is like and how systems with it are actually designed. It doesn't help that almost every advocate of DoD presents this adversarial relationship with OOP, as if they were incredibly different paradigms. They are not -- this is just a variant of OOP where you design a database that will contain the objects before you design the classes. You still "hide" data in this paradigm, not by making it private, but by making copies/not making it available at all!
48:43 - The guy asking the question very obviously has no idea what OOP is and why it is so bad. OOP is a mindset that _directly causes_ bad engineering. Without OOP dogma, most programmers would just naturally use better engineering practices of organizing the data separately and operating on the data in a manner that the CPU can process efficiently. It's the OOP that makes programmers have bad engineering ideas. The questioner sounds more like he was taking the talk personally, and was offended that someone would ever speak out against OOP.
So, is "Data-oriented Design" just a buzzword for what essentially boils down to "struct of arrays" or am I missing something big here?
You need to know how data is used. For example, seperating a point to arrays of x, y, z will only kill performance. I think the whole oop hatred thing is a bit misleading.
+Lttlemoi
No is the response to the shit Java which much die and I think that when people say OOP in reality they think Java paradigm and a lot of langage, exotic or not suffer to people want that the langage look like Java.
Do you know that Java is not OOP. Smalltalk yes but C++ and Java are inspired by Simula not Smalltalk.
People seems to lose if there is no Public, Private, Protected keyword ;-) If you know what I means.
3 lies of OOP against the reality of machine, proc, architecture et cetera :
www.slideshare.net/cellperformance/data-oriented-design-and-c
1) Software is a platform
2) Code designed around model of the world
3) Code is more important than data
@@jumpman120 "Code is more important than data" - this is actually one of my top issues with many IT projects/human behaviour, often people put lot more effort into protecting their code (doing things like obfuscation and anti-piracy) while they didn't sort out yet the data backup and didn't document well their data structures. Then in a decade or two they look really surprised when they suddenly have hard time to move to new platform, because their code is now obsolete, and just copying data and write new code is so painful that often even emulation of old platform is cheaper for them.
While in reality the data are way more important than code. Code can be written... (especially if you had it written once, you can often pull out the second version with same features in much shorter time)
resume at 19:27
Oh look another case of defining OOP being synonymous with inheritance when the two are completely different things.
"No true OOP."
Can somebody explain to me why he talks about 6k cach misses with the OOP design but only "2" with the data oriented design?
Because in the OOP use there is a cache miss on each animated square because they all inherit from the same type. So each of the 3k squares create 2 new animation instances. 1 movement animation and one color change animation.
In the DOD method he only gets a cache miss from the template initialization so each of the 3k squares make references to the same animation once and only once since all squares are simply structs in a verctor. So it should be 2.
@@Holysoldier000 Still don't understand what cache miss has to do with it. Afaik, a cache miss simply means that an instruction requires data that is not located in the cache right?
So we either check what the compiler does or we can make a few assumptions. I don't see how we can tell whether or not there will be a miss or not.
I see that the OOP version has issues with vtable lookups, but whether or not we get cache misses depends on were the stuff is allocated.
@@CrazyHorse151 The way he is determining the the cache misses is from multiple inheritances.
In the OOP model each instance of an animation will have to defer to a virtual function that it inherited.
if you look at the DOD the function reference is now a template and not coupled to the "animation instance" as there are no instances. So if there are 2 types that get unfurled from the template you only have 2 virtual functions.
Think of it this way 1 array that runs through 100 objects that call 2 virtual functions is at LEAST 200 virtual functions.
2 virtual functions that run through 2 arrays of 50 objects each is ONLY 2 virtual functions.
The idea is to separate the function from the data if that makes sense.
@@Holysoldier000 Ah, I think I just got it. We're talking about the instruction cache, right? So we not only have to check for the virtual function, but we also have to jump to an instructions that most likely is not in the instruction cache, right? So this is were the cache miss comes from, not from the data but the instruction?
Btw: Thanks for helping me understand this!
@@CrazyHorse151 Start here:
Pitfalls of Object Oriented Programming
harmful.cat-v.org/software/OO_programming/_pdf/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf
Battlefield3: Culling the Battlefield
www.gamedevs.org/uploads/culling-the-battlefield-battlefield3.pdf
Unique mistake he did is call this nice and cool talk like that. We are not stupid people, guys... seriously! Is the typical sentence when a new king sit in the throne. Let me do the analogy: in Coherence, rendering HTML5, OOP practices like developing apps, or so, simply doesn’t work. Let me say more... in any render engine should be used OOP as you use it for apps, or libs, etc. Every country has its own king ;) Saying that, he should change the title of the talk to don’t create some bad reactions, as Data Oriented Design is... DESING doesn’t even mean you don’t use OOP, if... polymorphism and inheritance only is OOP. We know we pay for that use, but is not the performance problem there, he used templates, composition, not use IF statements in the Tick call, etc, etc, so, is a design, strategy, a way of coding to maximize performance. Anyways, good talk, and great that we have a discussion here about it, is how we enjoy 😉
One of the main principle of OOP is to keep data&behavior together with collection of such objects. Both will lead to no efficient scenario for your computer.
Could you please share the code of the examples ? A simpler case with only c++ may have been more easy to understand, it's difficult to assess if the problems are related to html5 / js / css. While Chromium is object oriented, it is not a synonym for OOP.
For example when you say the OOP needs building a complete mock DOM tree, you mean Chromium. Another OOP system could be designed to mock a list of nodes, just like you did.
6x is not even an order of magnitude better. what could we get while optimizing the 'oop' code for animations, espicially optimizing for cache misses?
Title is a bit disorienting. IMHO “Designing your codebase with cache in mind” or “Cache friendly design” or “Know your machine” would have been more appropriate. OOD is tool like any other tool in the language its wellness of usage depends on the context, a thing that also the speaker points out in the end. Besides that, the talk is good and informative. Another downside is that the speaker is moving a lot, I can only imagine what a hard time the cameraman had... :)
I need to watch the video first, but too many code-related videos have stupid camera work where they focus on the presenter instead of the code and other important information: other than identifying the presenter at the start and end, video recording of the presenter is throwing away the most useful information, which is stupid. People go to tech presentations to get technical information, not see someone dance!
Totally agree.
well no, OOP has particular design that goes against performance and efficiency
He moves around a lot because he's data-oriented but the data is scattered everywhere. At the end he's a lot more relaxed.
I really like the idea of dataoriented and as a student who worries about performance and only leaning oop I am quite annoyed that while these speakers think that what you learn in school is wrong and then they do a really poor job of explaining the thought process in simple terms and examples. I find the examples way too complex, maybe also because I don't program in c++ but still.
lot of this stuff is trivial to grasp if you learn some basics of assembly, and theory of computer architecture, and how precisely data are encoded in computers and processed, after that most of these examples will probably ring the bell even if you don't fully grasp C++ (which takes years of experience to master, so you shouldn't feel bad if you "just try it" and it feels like everything goes wrong and your source looks worse than similar thing written in any other language ... actually it's so bad, that most of the "C++" projects out there in production have rather average or bad source and could be rewritten in much better way by somebody experienced enough ... you should probably do every year some C++ for few weeks and read into some open source projects from time to time, even if you prefer different language for your actual production).
@@ped7g Wow @ped7g you are exactly what is wrong with this field. "Lol its trivial" you guys dont realize how much harm you do to the community with that elitist bs behaviour.
@@RodriTheMighty I think at least it's relatively so. It's more difficult to become a master of C++ without any knowledge of assembly, profilers, and computer architecture as much as people want to market the language as being such. Even if the goal is OOP, it's difficult to even design in OO very effectively absent such knowledge. For example, someone who tries to learn how to design an effective multithreaded software architecture without understanding underlying computer architecture concepts like atomic instructions, CPU caches, and false sharing will probably have an even harder time than if they learned these fundamentals first. "Trivial" might be the wrong word but I think it's at least an easier path, and I think it's at least not as daunting as it sounds.
One of the first things I recommend is a profiler and start measuring code. Then learn on the fly with each hotspots we encounter as to why they exist whether it's the result of cache misses, branch misprediction, poor instruction selection by the optimizer, false sharing, or even something much higher-level like an inefficient algorithm with poor algorithmic complexity for the size of the inputs it handles.
We might have a problem with some elitism but I think a huge problem is that we are trying to get C++ programmers to run before they learn to walk, trying to use a language as complex and advanced as C++ before they even understand the basics of computer architecture, and trying to optimize their code before they even learn how to use a profiler while misquoting Knuth on premature optimization (Knuth's original paper's point is that all optimization is premature until code has been profiled, and then after it's not premature -- his original proposal where his famous quote was drawn from and incessantly misquoted was to use gotos to generate more optimized branch instructions instead of loops).
@@RodriTheMighty I'm not sure if that's the case here, but the word "trivial" can be used to mean the opposite of "nontrivial". A professional C++ programmer is totally expected to consider the basics of C++ to be trivial. And an oop student isn't really expected to pick up C++ as their first oop language so yeah you get elitism clash here.
C++ isn't really a "community" language. It is a language for programmers. Maybe that's wrong, but if you look at the other side, it would be really bad too if there was no talk meant to be by programmers for programmers. You'd never talk about things without having to revisit the basics.
Caches killed OOP.
Chrome is a very specific use case, not everybody is doing in-memory management of their objects. Most of programmers out there are using databases not in-memory storage. How do you do this in databases? I guess: you can address each table as a long collection of objects, you can have indexes and composable queries to filter the collection down, and then at the bottom mapping. I think would be something like that more or less...
Databases are actually a good example of data oriented design. You fetch the data you need only, iterate over that data and do your thing, then update the database. That's pretty much data oriented, but with an extra step of communication with your database.
Think of it as if you're storing the columns instead of the rows, and if a certain object doesn't need a column, you don't add its id (think primary key) to it.
Where can I download this fast browser?
You can't. It's not stand alone product. It more like electron/CEF you can use it to build your own app/game, but it is not free.
Have you ever seen any game engine design. Data oriented design is being done a lot.
In the beginning of the talk if you followed and listened carefully, he emphasized on not seeing this type of software design adopted for other projects than game related technologies..
So rather than profiling what is making chromium slower, just attribute it all to cache misses rather than it simply doing more stuff with fewer assumptions than your implementation. Exact quote was "doing pretty much the same thing". When specifically asked this, he admitted this, and even mentioned "simpler call stacks" like that means anything after building C++ code with optimisation. Didn't test it on ARM because cba to build it, but yeah ofc it will be an even bigger improvement there! So you admit your reasoning is not evidence based. Anyone can get 6x improvement easily with lazy measurements. Why not link us to your pull request to chromium then?
array of structs versus struct of arrays
Is it necessary to master OOP to go to DoD?
This is still a kind of OOP. It's just making different choices about how to divide the objects -- grouping things based on their usage and making duplicate copies of "shared" data where necessary rather than actually sharing it -- vs. grouping things by related logical concept irrespective of usage.
"Classic OOP" just works, but it doesn't scale as well to large datasets or complex multithreaded usage. This sort of design is better suited for that, but requires more external infrastructure to ensure that inputs and outputs are correctly propagated between the systems and that all the multiple copies of the data are updated as needed.
If the talker is the standard: Evidently not.
This is essentially the same as database design. What he calls "existence-based predication" the database types call the "first normal form". Since writing and interfacing with databases is one of the key things OOP was designed _for_ it seems hard to see these concepts as adversarial.
"Chromium made by the best engineers"
"They have const methods that do const casts and are not read only"
Idk if that was a joke 😂
Since he made this talk, chrome devs wrote a bunch of blog posts about how it's "impossible" to prevent use after frees so they'll just never free anything, and that they can't fix their flaky tests so why bother.
Really not impressed with chrome devs right now.
It's a really great talk but his constant walking is super distracting.
Object-Oriented Programming isn't about inheritance, in fact in a sense by using the templates here there is still OO (object-oriented) behavior being performed, its just also being performed in a DO (data-oriented) manner, DO and OO are not exclusives except in languages like C# and Java where everything is a virtual class with virtual objects attached to the runtime with a required type system. But that's foremost a dumb thing for a language to do if it cares about performance and secondly restricts the programmer to a singular paradigm. In C++ this isn't the case but to act like such is also quite foolish, you can use OO and DO at the same time, inheritance has some useful cases when you don't abuse it, its convenient and scalable and only in some simple composition cases should you consider using inheritance with your OO systems, for DO it can be great for the CPU cache but is terrible if that wasn't the bottleneck of the system which is not always the case, asynchronous code for example will almost never get a benefit out of CPU cache behavior and in fact designing for it is very likely to cause more problems then it solves, (which makes it problematic for something like UI development which relies on async-like behavior that does not benefit from CPU caching) OO specific behavior (disregard inheritance) can solve this in some cases. The biggest point for OOP is that it lays out an interface for use of an object, that means minimally required functionality, if the data within is also readable then others can also still treat it as a DO object and it doesn't even violate DO design. You are not required however for objects to be comprehensive nor to have virtual tables on them, inheritance is not a requirement. (not to mention "static inheritance" removes the virtual tables problem, which is almost exclusively responsible for those cache misses, and it starts to become more data oriented)
Yeah, inheritance is essential to express the base functionality but in a different manner. Its just that whenever internal side effects occurs due to that base function, everything goes to hell... It is essential in some context yes, but very volatile when you introduce mutable state to the base class....
System alpha,system beta=>function alpha,function beta??
DoD talks reeeeeaaaaallllly provoke OO-friendly audiences.
I know this is an old comment, but I think that is a good thing. I've seen a lot of talks about the value of DOD and the evil of OOP, but not a whole lot that goes beyond classic college definitions that no one would actually use in the real world. If there can be more interaction between people that have a lot of experience in OOP and people that have a lot of experience in DOD to compare and contrast the pros and cons of each approach in real life situations, I think more programmers (myself included) would greatly benefit.
I've been working in OO my entire professional career and am pretty curious about DOD, but my bar to entry right now is trying to think about what I currently do and what it would look like in DOD. If there are more conversations that bridge that divide, rather than extolling or condemning one or the other, then it would make the transition to using DOD much easier.
The aims of the data oriented design seems to port functionnal paradigm to OOP langage to reduce the problem of state of hell and the lies of Java langage which break the "cooking recipe" approach of C langage which make difficult to understand how the processus state and memory evolve during the execution of the process.
The point of OOP isn’t the greatest efficiency, but to be effective with large teams of mediocre quality of devs. If everyone was a Mozart, they would writing in assembler and get the greatest performance. Long term functional programming requires talent in math. Most devs are incredibly untalented when it comes to anything above high school level. This isn’t because of the defective educational system. Humans are better with stories than abstract thinking.
Agreed
There is a serious question I have and it is tied to what happens at 5:10. He says "in my hotel room it was 70, but who knows what happens" - isn't this what the problem is with a strict DOD? When you write games, everything, literally, is streamlined. You are not browsing the web at the same time, you are not taking calls on Zoom or Skype etc, the list can be made long here. Browsers today, are like small OS and while optimising that rendering of moving things obviously can gain a lot from DOD, I have my doubts about rendering diverse things like regular websites which isn't just moving objects that do things.
I believe the goal is to be able to integrate DOD for specific things that would greatly benefit from it, when needed. For instance, with Unity right now, you can do this with Unity ECS. Some things would remain OOP, others DOD.
Yes a regular website is not going to benefit from focusing on primitives, only something like a 3d game in the browser would. Data-oriented design is still used in the web world, it just means separating code and data instead of avoiding objects.
lol because "Battlegrounds" is such a great example of optimization and performance... not.
+Bruno Xavier Leite
When someone talks about Java and SOLID and design pattern. Can I say that it is bullshit and there is so illusion in OOP are not ? Because everybody use OOP and JAVA and public, protected, private, and design pattern and programs are always a ton of shit.
Enter the era buzzwords!
Ah yes the good old new Blockchain powered, AI and advanced algorithm running in the cloud
It's great really
At this point. It's so easy for me to write in an OOP way and test that I don't really care about trying the alternative just because some people think it's better. Anyone proficient with OOP knows what the pitfalls are and tries to avoid them (ex: inheritance)
Data hiding is just ridiculus
I think the author is unfair and his belief that DOD wins from any point of view is an exaggeration. DOD is a good approach for optimization but I highly doubt it could be an effective method of development by default. The argument that code is easier if it's in one place is very very old and it reminds me about the time when we were trying to switch our paradigm from "Spaghetti Code". We don't write such code anymore exactly because we learned to divide our code and use proper abstractions, separation and isolation. With DoD we can end up with a completely unmaintainable code. Once I had a chance to work with a particle system where struct Partice had more than 30 fields and a huge chunk of code used it. Fixing bugs was a disaster adding new features was a disaster but it was exactly DoD.
You're falling into the same traps as the video.
You can make garbage code in every paradigm. Ignorant people who program by accident and/or cargo cult are always going to end up with bad results. OOP isn't immune to this, or even very good at guaranteeing good results "by default". Taxonomies are entirely viewpoint dependent, and they are subject to change over time. Defining them statically tends to solidify them in your program. Designing all your abstraction in an ad-hoc fashion, before you even have a grasp on the entire program's data flow, will guarantee your code will be hard to follow and hard to maintain. Abstraction-first is absolutely the wrong way, and typically leads to encoding YAGNI-violating abstractions everywhere. Then there's the problem of refactoring them once your requirements (immediately) change...
On the other side of the coin, nothing about data-oriented programming precludes you from splitting up your code into logical units and separating unrelated code. Modules and namespaces and functions exist, and are isomorphic to classes and member functions. If you want a "this" pointers, just pass it as the first argument to the function and name the parameter "this" or "self". If you really, really want that "dot" method call syntax, uhhhh.... invent a language that isn't C/C++, or make a macro, lol. It isn't an abstract limitation of the paradigm is my point.
But let's stop attacking stupid strawmen/anecdotes of the worst possible version of each other's paradigm, and get to the root problem instead.
Splitting that 30 field struct would almost assuredly be a good idea here. If most fields aren't used for every particle in every iteration, then you don't want to be loading all the unused junk into cache. Neither OOP or data-oriented programming save us from the refactoring pain, either. OOP gives us some tools - moving members around the class hierarchy, various composition/pointer based abstraction patterns, this pointers, but they're of limited use and there's large classes of data structure shuffling that isn't made easy with only those tools.
A better tool would be some sort of "struct hierarchy typedef". The "this" pointer + inheritance is kind of like an anonymous-only version of this feature. Something like "using end_color = particle.end_pattern.color;" to alias all the fields of "particle.end_pattern.color.r ...color.g ...color.b" to "end_color.r end_color.g end_color.b". You could also support anonymous (unnamed) access, with just "using particle.end_pattern.color;", and then just use "r" "g" "b" directly, as if you had inherited from color. (anonymous access to r, g, b is not a great idea with this particular example, but the point is to show that it is a generalization of the inheritance+"this" concept). Let it be used in every possible scope - define it inside a module, namespace, struct, class, function, member function, for loop, if statement, anonymous curly-brace block, etc. Then you could arbitrarily compose structures the way you wanted them in memory, and be able to shuffle them to your heart's content, but minimize the number of changes you had to make to existing code in order to do that shuffling. You just add or edit the correctly scoped "using" declarations, and all your code would suddenly compile again.
Another very good thing would be unifying "->" and ".", so that you only had to use one or the other and it would work for both fields and pointers. The explicitness is slightly nice, sometimes, but it makes it so all our code breaks whenever we switch between fields and pointers. Array and pointer access is already unified in a similar way, so I think we have some proof that melding like this can have at least some merit.
Both of these changes could improve both OOP and data-oriented code.
I don't think so. He actually argues that their solution has less branches. That could also be described as they have produced less "Spaghetti Code". However, I think the talk wrongly presents DOD as an opponent to OOP. According to my understanding of DOD and OOP, it's not. I think the main-point of DOD ist that you should think about the data flow in your application, before building your abstractions. You still get abstractions (maybe using OOP) but they might be different. As shown in the talk, they still have abstractions for different parts of their animation component. However, they decided to not have an abstraction for the Animation itself, because it would not fit nicely. They only provide it as an interface for other components.
I am hearing the presenter making a lot of speculative statements about performance. As a general rule improving performance increases complexity, and as such should only be done when necessary. As a specific rule you should never speculate about performance, only measure it.
It’s not speculative when you say something is a cache miss or not even though it wasn’t measured.
It sounds like the comparison is not between DoD and OOP as much as the design patterns commonly used. Otherwise what you call DoD is actually just OOP with data-friendly design pattern