Hey, thanks for taking the time to discuss and explain my post! Generally, what you said was all accurate, and it's great to see these things explained in detail for a more casual audience.
there also the case when main thread access to cpu0 which is most bound for windows os, while the ps force the os to use only cpu7. some dev ported to pc but didnt disable cpu0 affinity and cause main thread to be slow cos it need to wait for os doing shit thing first.
Me buying a 144 hz monitor 10 yrs ago: wow imagine how smooth and clear games will look in the future with even higher refresh rate and faster gpus. 10 yrs later publishers still targeting 60 fps but now it's upscaled and interpolated fizzling blurry mess.
I've been a PC gamer and built my own PCs for 30 years. It has always been like this. As soon as hardware improves game developers want something that looks better and has more complicated game logic. What I don't get is why people seem so puzzled about it. You don't have to run at Ultra with RT Nightmare enabled. If you happen to prefer extreme fluidity over extreme visuals... just dial down the settings. My XTX is 5.33x faster than my RX580 was. So I can chose to look at something nicer or I can choose to look at the same level of graphics 5.33x faster. Or I can chose something in between.
@@andersjjensenthe choice is the great thing about PC gaming but unfortunately human nature will rear its ugly head. Many people will psychologically feel like they aren’t getting there moneys worth if every setting isn’t slapped to “ultra” which includes RT.
@@BoYangTang You've been fooled by someone with no idea if you thought we'd get 120+ FPS optimization targets. 60 works great, if we aimed higher games would just end up looking dated.
@@andersjjensen I hear where you're coming from but I would say that it's not trivial to dial down the settings for higher framerates. e.g. going from high to medium for +10% fps, the best option after that is probably to lower render resolution. Also if a game is bottlenecked by a couple cpu threads all you can do is interpolation if its implemented. Imo it's a difference in priorities I just hoped all those years that as higher refresh displays became mainstream games would be designed with better fluidity in mind because I felt the benefits were worthwhile.
I think the biggest takeaway you can get from this video is “luckily the main developer had an AMD GPU”; explains a lot about the state of game development.
It makes you wonder how much more optimized games might be on average if they were developed on midrange hardware that matched their games' recommended specs. It's probably harder to see how unoptimized your game is and care if you're doing all of your work on a 4090. Not that I blame developers--I'd want the most beastly PC to develop games on, as well, because it'd be easier and faster--but there's something to be said for the creativity and ingenuity that comes from being forced to adapt and optimize as you go (ex. low budget movies that put every dollar to use versus bloated AAA productions that waste money because they can).
@@Osprey850game development engines these type of development need a beast machine to even start to develop so i disagree entirely . I tried to develop on my rx6600 for a collage project and it was pain in the ass
@@omargallo9636well yeah it’s a 6600xt lol 😂 you need the pro series RX to do all that , also they don’t use 4090s they use quadro if I remember correctly unless the name is different
@@omargallo9636 That's why I said that I would want a beast machine, too, if I were a game developer. I know that developing games on midrange computers isn't ideal, but developing on the highest end GPUs and CPUs probably isn't ideal for optimization, either.
Here's an example of a parallelism problem causing CPU bottleneck without CPU being 100% utilized: - Let's say a particular work can be parallelized into tasks A, B and C. - This mean A, B and C run on separate threads. - The work completes only when all tasks A, B, C completes. - Let's say A takes 1ms to complete, B: 2ms, C: 10ms. - Now the problem: the work has to wait until the slowest task (C) completes *while the two other threads IDLES*. - This is just an example but in reality there are many more tasks involved and works depend on one another. - This creates a complicated web of things waiting for other things. - Not to mention on PC, sometimes we find that a particular task takes way more time to complete only on a specific hardware configuration. This wreaks havoc to the finely-tuned task scheduling.
the easy way to read it is: if a single core of the CPU is stuck to 99-100% for more then one tenth a second that thing is CPU limited, and the same for the GPU. it could be the specific Tick/Update, or the specific combination to resolve application state. The leading contributor to the misunderstanding is that a lot of the frame-rate overlays only show "total CPU, and "total GPU". the GPU is more understandable, but even a 2-core/4-threadthe CPU should show 4 or 5 percentages not just 1.
Developers 20 years ago have done it with the nightmareish PS3 CELL CPU, PS2 Emotion Engine and its several cores and SEGA Saturn and its literally 8 separate processors, and achieved things unimaginable for the hardware. Y'all are just lazy. No, you can't make this about "uhm but processors are different and stuff on PCs". Yeah, the brand changes, but unless it is a processor from 15 years ago, it will always support the same features, it will run on the same instruction set and it will work in the same way as another processor, just faster or slower. An FX 9590 works the same as an i7 4790K, but the i7 is simply FASTER. Nothing else changes. If you develop something with FX 9590 in mind, but then try to make it work on a 4790K, it will work right off the bat, just FASTER. And if you try to make it run on a slower processor with the same features? It will run slower. Incredible! Now, when you go way back in time and actually start lacking features (For example the decrepit MMX), THEN you run into problems. That said, stop being lazy. I know you are underpaid by your bosses and they give you impossible deadlines, but you have the most power. If you are sick of how unjustly you developers are treated, QUIT. Make sure that your loss is FELT.
@@TheRealNightShot it isn't just "being lazy" if I need to go and calculate a number that requires these 5 other non-cached values, and I need that number NOW, then I can't async out those calculations. highly threaded workloads need to be designed specifically for that, and it turns out that many games are less optimizable, even the Dev in the story talked about said 'only some specific aspects could be threaded' Cinabench is highly threaded, but this is one frame being rendered on hardware that was never optimized for rendering. Games tend to have a "world thread" which cannot be asynced, and it turns out when these are heavily asynced you get race-conditions, and/or just incorrect results.
@ Again, it’s not unheard of, it has been done. If an approach doesn’t work, you make a new one. It’s been 40 years and developers still linger on the same fundamental approach to game code, which was meant for single threaded machines. Maybe it’s about the time to move on. To have such advanced and expensive hardware and no ability to use it. Games perform pitifully, and therefore It’s no longer possible to dwell on “this can’t become multi threaded”. Something must be done. Yes, it requires money, it requires time, but that’s what game development should be about. Have you seen how long GTA VI is taking? Have you got any idea how much money they’re spending on it? How much effort? And yet it will all be worth it in the end. I am read to bet that a GTX 1050 Ti and i5 6400 will be able to run that game.
Durante was indeed the guy who made DS Fix that turned the og DS1 PC port into something playable. He eventually work consulting on PC ports by Xseed for an old JRPG focused company called Nihon Falcom and then founded his own company with friends, which is now primarily porting JRPGs to PC for NIS America (mainly still games by Falcom, which develops the Ys and Trails series', both of which Durante id also a fan of).
I know a few devs and it's a little hard to point the blame at them for poorly optimized games. They don't want their game to run like crap with stuttering, crashes and overblown hardware requirements. They have budgets, timeframes, resource limitations etc that all come down from the 'higher ups'. They are essentially forced to release their games before they are totally happy with it due to publishers, investors etc having the final say. There is a bunch of tick boxes to get through before a game is released, and as long as the right amount of boxes are ticked, that's all that matters. The other half of it, is inexperienced devs that don't understand the engine and tools properly. They still don't want to release a bad game and still have the push from above, but lack of experience, and publishers and investors once again only caring if the minimum required number of boxes are checked can only lead to a terrible experience for the gamer. The worst part is that these inexperienced devs think this is normal to release games in these unoptimized states, which means they will never get better at their craft.
As a software engineer, I agree. This happens all the time, and I genuinely want to develop something flawless. However the high management pushes to deliver the way it is, and that's usually the commercial and marketing department's fault. They sell dreams to customers, take their big bonuses, and all the hard work falls on our shoulders.
Shit construct. The publisher/higher up don't want to lose money with long development time. The dev just does what higher up want, but also doesn't want to improve the game more than needed. Both playing checkboxes and blame shifting if there's something bad going to happen. Both neglecting the consumer. The dev have power to negotiate at risk of losing jobs, while publisher have power to make whatever they want at risk market not buy it. This blaming game already gone to its due, not only creating toxic environment where the cancer CANT be stopped but also people moving away from Covid-impact(in my place there's a ton of people selling PC SETUP second hand, probably 1/3 even 1/4 of its price, most of them come from 2020-2022). That's why many people praise Baldur Gate 3 devs, heck even PlayStation already try to hold Palworld hands to PS5 despite Nintendo monopoly. If dev really don't have enough courage to stand for the consumer, it's already fall into the PR game. More often than not, dev best interest is always on consumer. But what can we do, no one want to risk losing jobs right ? Don't worry, market will fix it itself, even with AI and Indie Game rule the tools. Well, I do feel it's related to the construct of some 'debt' country. But the market already shrinkin way before anyone can anticipate.
It certainly doesn't help when western game studios lay off thousands of talented developers, many of whom will choose to leave the games industry entirely and take their knowledge with them. Brain drain has a huge effect on future projects.
The actual decision makers at AAA’s who don’t actually play games, “if console targets 30/60 fps why spend budget on optimizing for higher fps on pc, they are getting the same experience”
@@PrimeYT10 nope, that's still not the main reason. The main reason has always been investors, managers, executives who all keep imposing deadlines on unfinished products. Funny how we keep forgetting that because we never learned the true reason about horse armor DLC.
@@BleedForTheWorld yes, and upscaling being the excuse to skip testing and optimizing the games “if it can run on 30fps on console, deliver it to pc and give them dlss and frame gen”,
I am not a game dev anymore, but I was looking quite deeply into game development during my university days and also worked as a game dev for a little while... The challenging part is, exactly as stated and as you understood correctly, that most of the stuff is going on on the main game thread and this one needs to be synchronous since it needs to be deterministic what happens after a certain interaction or input and is usually bound to some sort of update loop that is called once per frame. Engines usually already use separate threads for graphics and audio. From here on as the actual developer of the game, there are certain aspects that can be threaded. Such things are doing load tasks in the background or calcucating AI paths for example. But to stick with the pathing example, the only thing that can easily be threaded is the path calculation. Performing the movement along the calculated path is usually done on the main thread. Threading is challenging due to the nature of asynchronous programming. As soon as we use another thread to execute some logic, we cannot know when exactly that calculation is finished. The threaded calculation can report back to the main thread once it is done, but the main thread only knows that it receives the calculation results at some point in the future.
This is where something like an ECS comes in. And if your logic is too complex for an ECS, just perform it every few frames and let the ECS simulate the momentum of the game objects.
Hey, I work on optimization and would like to add some context to parallelization in video games. When most mainstream game engines were made, CPU core counts were very low - so back then it was totally okay or even optimal to have your game have a dedicated "game" thread for simulating all gameplay code and a "rendering" thread for rendering objects then a few extra threads for background work like loading assets. Nowadays most game engines still have a similar architecture with a dedicated game thread and render thread with maybe a few extra dedicated threads for audio etc. The problem is this doesn't scale well with core counts and your CPU bottleneck will always be your slowest task which, in this model, is inherently single-threaded and is usually either gameplay or rendering code so the majority of the cores on your CPU will be idle most of the time. To fix this a lot of engines have implemented the "fork/join" paradigm which is probably the simplest form of task-based parallelism. The way it works is you take an expensive task like animation and instead of running it on the game thread you spawn a "task" that will run on any of the available worker threads which runs in parallel to the game thread and await the result. This is good and increases parallelism but still won't result in total CPU core saturation and one of the dedicated threads will still likely be a bottleneck. This is where job-systems come into play. With a job system everything is a "task" and (for the most part) there is no concept of a "game thread" - everything runs in parallel against each other optimally on any available thread. Now that sounds good but getting this right is very difficult and ends up creating a lot of problems that tend to require new programming paradigms. If I have tasks A and B, how do I know if it's okay to run those tasks in parallel against each other? What if task A destroys an entity that task B is reading from? Resolving these task dependencies manually is the source of a lot of esoteric bugs and non-determinism in video games so that is why the general sentiment is only multithread if absolutely necessary. There are also very strange performance pitfalls with multi-threading that a lot of people are unaware of and can cause performance to get significantly worse! If you have a region of memory that represents a player character and one thread modifies the characters health - all other thread's CPU caches around that memory region will become invalidated, meaning if another thread goes and reads a nearby memory address to access the character's position for example, they will incur a big performance penalty and have to fetch that memory from L3 cache or RAM (this is an over-simplification since hardware differs). This is called false-sharing. So parallelism becomes very limited in this context, only one thread should be accessing an entity with this model. This is why new programming paradigms around multithreading are emerging such as the Entity Component System (Unity DOTS, Unreal Mass Entity, Bevy, etc). Basically in ECS you explicitly tell the job scheduler for your specific job what resources you are modifying / reading and can automatically run tasks in parallel for full CPU saturation (only if the jobs don't have too much contention over resources or scheduling conflicts). ECS usually also avoids false-sharing because of how it lays out entities in memory. So the question is, is it really a game engine's "lack of optimization" that causes poor CPU utilization? In my opinion, yes (depending on the engine *cough cough Unreal*). However, developers should still do more to try and parallelize their games, but it often becomes extremely difficult or the performance benefits are just not that significant due to the engine's architecture. Trying to write multi-threaded code on top of single-threaded code is no simple task.
@@ZeroUm_ So you're saying the whole project takes 180% of the time. Every project just whizzes past it's deadlines then? Nah. I'spose you typoed that first 10%.
@@BaalaaxaIt's a joke about how project planning underestimates time taken to develop and test. Project managers have an incentive to underestimate resources required to please the bean counters.
yeah I saw this making flash games back in the day, like half the battle is just learning how to make the thing the correct way and spinning wheels until then but once you're running on all cylinders it's crazy the amount of work you can get done and how much higher quality it is. I think general limitations are good for this, it forces people to slow down and learn/do things correctly instead of spending more time spinning wheels or making "cool" things that inevitably need to be remade or scrapped. I think a lot of people fall into making gimmicky stuff because of this rather than learning the codebase properly, which I totally get but it's always the boring ass lame projects I learned the most from and the most burnt out I've been has been making something work that wasn't done properly or that I didn't fully understand.
back in 2013 I requested a feature for HwInfo64 which was implemented and rolled out in v4.14 called "max cpu/thread usage" -- this reports the % usage for the highest used thread in the CPU. this is a great indicator of CPU bottleneck: typically if I see that the usage was above 90% in a 1 second poll period, the GPU usage was also less than 100%. where it helps is that you can see how close to that point you are (looking at GPU usage by itself merely tells if you're bottlenecked or not). may be worth adding that to your rivatuner OSD config.
I have only one finished simple VR game under my belt (on SideQuest). For me, it was the number of draw calls which the CPU was passing to the GPU. So, CPU bottleneck. I had to integrate GPU instancing to eliminate that bottleneck. Was very interesting experience
@@jose131991 GPU instancing is really, really cool! You pass one game object from the CPU to the GPU and tell the GPU to render it 1000 (or more) times in any number of places that you want, with any size. I did it on a Oculus Go hardware, so had very little to work with, yet I was able to procedurally create whole constellations (using Poisson disc sampling)
Ys 10 is made by Nihon Falcom, they are old school devs that REALLY are niche and knows what they are doing for gaming communities. This Japanese company is OLDER than Square and Enix, in fact ex-Nihon Falcom employees from the OG generations left that company to form their own Squaresoft and Enix. Modern Nihon Falcom is maintained and organized by it's fans who are now CEO and staff members.
yes, the porting work for pc however is done by PH3 Games, which is a studio that the western publisher nisa hired for their porting works of falcom games. Its an interesting situation, because for other Asia regions Falcom partners with a different publisher, Clouded Leopard Entertainment, who create their own port work. In regards to YS X, the original PC release by Clouded Leopard Entertainment (Asian Language only) was so bad, that they had to take it off Steam and rework it for several months. they have now actually re-released the game and its in a much better condition. Doesn't really matter to much for us western fans, as again NISA and CLE cater to different markets. At this point in time, no PC ports have been done by Falcom themselves in a long time, and tbh i doubt they would go as far as Durante would with the extra QoL and game optimization stuff.
i doubt falcom has the necessary in-house expertise to competently make pc ports. most of the heavy lifting was subcontracted out to third parties like durante's ph3 games
@@aliasonarcotics There was at least a few Ys titles developed for PC first (Felghana, 6 and Origin), so Falcom actually has quite a history developing for Windows. They just don't bother do in-house porting anymore since they need to focus on making new games every year.
The only contact I had with Nihon work was in Trails of Cold Steel 3 in..... Stadia. And let me tell you, very optimized stuff there, given the fact that stadia was a running on Linux with vulkan, the port did work 100%, which is actually surprising given that most ports that hit stadia were pretty bad, obviously.. Altough the YS X PC port seems to be developed by another team.
Haha yeah the word has lost most of its original meaning, from all the people complaining 😅 To me optimization means the amount of "visual output" / fps. Take Witcher 3 version 1.32 vs ver. 4. In patch 1.32 (about 2016 to 2022) there was no ray tracing or higher-than-ultra settings options. But the game looked damn good. And it ran well. Then along came ver. 4 and it upgraded the visuals. But the frame rate was cut to a third of previous versions. But did it look 3 times as great? To me it didn't. Which means I would say that ver 1.32 was more optimized than version 4.0. Even though 4.0 was nicer looking if you just compared screen shots. The pit fall though is that depending on the users system, the performance impact can vary from user to user. Like the visual upgrade that ray tracing brings also has a higher performance hit on current AMD GPUs than Nvidia's. And I think some of the ambiguity of the word optimized sneaks in here. Where users might not agree if the visual upgrade is worth the hit to performance.
Or just runs well on regular hardware especially when you know it should. This to me is a shitty game making problem. How do i know that you say? Look at some games that look amazing and use all the same graphics technology as game that look amazing but run like crap.
@@kxddan02 Some games don't take a lot of resources. look at hellblade 2 most beautiful game but its linear corridors, no AI for NPCs, its all precanned animations or physics everything is prebaked. Then some games will have a lot of physicss objects and NPCs and will have to turn down graphically for your PC to accomodate.
yes, durante made DSFix. he also made the generic downsampling tool to render games at insane resolutions before nvidia's DSR came out. impressive increase in this one 👍🏻
It's been very consistent generation to generation. 10-20 fps each generation on the same game. 30 series got 15 more frames on 'example' game over the 20 series. And 40 series got 15 more frames on the same game over 30 series. Of course because the total frame quantity is higher the _percentage_ looks smaller. But the uplift has been consistently the same.
at this point, it might make sense for reviewers to not make this kind of content until the first quarter of next year after the 50 series gpus have launched. those kinds of videos can take quite a while to make if the reviewer/content creator is actually running fresh benchmarks for everything and not just pulling data from other publications - something any of us could do.
@@Ghost-pb4tsTPU data scales pretty poorly across gens sometimes. Quite a few GPUs are also scaled linearly based on theoretical performance (basically GPU clock speed times core count), which tends to disfavor lower power parts.
"A lot of development time went into debugging problems that arose to this parallelization." That's really key. Not only can tasks be difficult to logically split up in the first place (see the uh... baby example) but after you do, you get completely new kinds of problems that non-parallelized code simply doesn't have, namely deadlocks and race conditions (and probably some others that I forget). What's even worse is that these are extremely hard to spot by looking at the code and compilers certainly won't catch them either. Then when you try to catch them during runtime using a debugger, using breakpoints or just the debugger itself can alter the timings so that the problems don't appear to happen (but re-appear when you don't use the debugger). Parallelization can be a questionable effort indeed.
I am not a game dev but I dipped my toes into high performance computing, software defined radios, and simulations. It is very very very hard to get 100% CPU, even with pure math tasks. Some problems cannot be parallelised, some can be partially parallelised and some of them can run 100% parallel. But sometimes all your threads and processes have to wait for something else to finish. For example, you cannot have multiple threads writing on the same file or network connection without synchronicity (a.k.a. They have to wait on each other). Sometimes you have branches in your program that are not symmetrical and sometimes you get stuff that depend on random numbers . I can imagine that a videogame is even more complicated just because you also have to manage user input. Edit: I wanted to add the massive headache of allocating memory, freeing unused memory, and copy stuff in memory, which you always try to avoid but sometimes it’s inevitable.
i spent a week optimizing rain for my doom mod once. i started with physically accurate, individual rain drops and were affected by forces around them, spawned in random positions and were constantly active. i ended up with "clumps" of rain that had a fixed speed (no acceleration), only spawned when you were near and actively looking at them and in a fixed, pseudo-random-looking pattern. the final product looked better (much denser rain possible) and ran at least a hundred times better (i mean, it cost a hundredth of the original implementation to run. the game still ran fine on my pc with the original implementation, i was just optimizing for weaker hardware and scalability)
This is a useful video. The word "optimisation" is getting thrown around way too much now. A lot of people misunderstand what optimisation actually means.
@@fireballs7346 The fact that the 4090 can get around 50 FPS at native 4K max settings with RT in the less demanding areas is impressive. RT is notoriously demanding, considering multiple rays are being shot per pixel, and the GPU has to do millions and millions of calculations when the light bounces from one surface to another. I would only consider this badly optimised due to terrible traversal stutters. Not the actual performance. Silent Hill 2 is a large open world AAA game that has a bunch of high-resolution multi layered textures and geometry, which further makes this more demanding. Patches should be coming soon to address any frame spike issues. The horrible stuttering issues would be the reason why I consider Silent Hill 2 badly optimised.
Wait, what.. Are you saying the internet is full of random people with little to no actual relevant experience and knowledge, speaking as if they are industry experts? Shocking development...
@@Gamer-q7v to kind of be devils advocate, compare silent hill 2's rt performance to cyberpunk, wouldnt you say that silent hill 2 is poorly optimized considering how much less fps it gets despite using all the same ray tracing features?
yeah this is best case scenario unfortunately a lot of high end devs don't bother with this kind of deep optimization, at least there are port studios that care about this stuff
In the 90ties I did a lot of coding and optimizing in Assembler on my (long gone) Amiga 1000 as a hobby. There was also some kind of parallelization going on with the custom chips, so I know what you are talking about also the OS had some quirks (documented and undocumented). I think you can get the most optimization by a good planning phase that means finding the fastest method of getting the desired result and already include parallelization (structure) in that phase. It will save you a lot of dev time. I disassmbled some graphics demos, learned methods and expanded from there. Code optimization can be fun and I ended up cleaning up highly frequented loops or getting rid of loops completely (like a number sort that turns out to be only dealing with 10 elements all the time, just write it out, don't call up a lengthy quicksort etc), converting floatpoint math to integer, creating lookup tables with precalculated values etc... I guess today coding is on a way different level than the stoneage stuff I wrote about.
You’re missing one crucial point here: yes, you can solve all of that with clever planning and knowing your hardware if you’re the sole developer who’s responsible for graphics code, gameplay code, and art all at the same time; with modern teams consisting of hundreds of people, it’s simply not possible to anticipate every possible thing a team member on the other side of the globe is going to try to do, and a huge amount of issues inevitably goes to the quality control side of things where you’re incredibly limited in what you can do to address issues.
@@sasha_chudesnov That is why I said that it is on a different level today however spreading the dev task around the world is a major fault IMHO and must lead to problems and I wonder if the initial cost saving really outweighs the additional debugging time.
Great video! Thanks for showcasing this article, it is enlightening. As a developer - but not a game developer - it's an interesting look at this process from another angle. And It does a fantastic job of disproving the idea that
What I hate with these types of comments is how many people blame the devs No dev wants a game to be unoptimzied, but publishers often view optimization as "a waste of time" (and from an economic standpoint it does tend to be) but devs would, if given the time, optimize till the game runs on a potato, but publishers just don't give that time
Ok how do modders fix and impelent so many quality-of-life stuff in less than a week Without access to the source code Doesnt that raise any red flags to you
I really enjoyed the demo for this game. It's running great. I'll buy it on Day 1. I also noticed the developers posting about game optimization which I regularly see on social media (for example Reddit). I hope some other technical focus UA-camrs take a look at this as well.
Durante is a god among men. You know the PC port is gonna be good whenever he's involved. He actually still seems passionate about his work even after all these years.
I wonder if Daniel would be willing to dive a little deeper on this channel technically. I would love to see the basics explained, on what happens on the screen vs what happens on the cpu / gpu level.
I try... but I'm not saying what people want to hear. So a 47 year old software engineer can go cram 34 years of programming experience up his ass because people want to be angry that the latest games at max settings stress the latest hardware available... like they have for the last 30 years.
@@jose131991 Nobody per se. Just a regular commenter who often elaborate or expand on what Daniel says. He's a math teacher who has a reasonable grasp on the fundamentals, and it shows. But it also shows that he doesn't understand operating system and coding theory much deeper than just conversational level. Not that he ever pretends anything else.
Optimal parallellisation is engine architecture problem. Its OOP inheritance vs agegration vs functional and data oriented . For optimal MT parallel compute you need to implement it at core of your engine. If then the game does not scale then the game problem is to light to split up. This is also genre and scope dependant. The first step is using DX12 or Vulkan to have multithreaded render subsystem where you feed gpu parallel. DX11 supports MT but not efficient. To me DX11 is red flag as render is single threaded feed it MT it doesn’t scale. If scale up to 3 threads it depend is it to smal scope game is inheritance oop. The game that scale best are those with have large amount of entities this is where you can go very wide. If your engine could handle different genres also RTS and if those scale well with MT then your engine is MT optimised. That means games where there ar little objects won’t scale with MT even if it higly optimised. But then are often light in load. Unless these are complex and graphics heayvy objects. I would avoid globals and singletons as what put wrench in parallel compute is not avoiding as much shared data. Also opt for function without side effect. Keep as much compute independat. It would be interesting what triple A dev do. But they use inhouse often shared game engine. Wich need to be highly data driven and support any genre . There is online ubisoft convention speak how the implement engine from large overview.
This is such a great video. I often faced the same problem where people would say "this game is no optimized", but what if the game simply has too much to process? I think this has become somewhat of a gamer term where bad performance means something is not optimized. This video is very insightful for the general public
The fact that The Finals can run how it does even with insane destruction and full ray tracing tells me UE5 isn't even a problem. It's clear that all these other UE5 games are poorly optimized rather than being engine limitations.
i tried The Finals first week of lunch, was pretty unplayable at that point, so i am giving it some time, remember how overwatch was shit for a year also. are you saying The Finals has improved a lot since then?
the finals only uses RTGI what is far less intensive than "fullRT" and also just plain looks allot worse than regular RT since its still just using probes like normal lighting but updating them every few seconds with RT instead of them just being prebaked
You should cover the article mentioned there about mouse polling. It shows that one of the "poorly optimized" case is actually "dev did not account for player using a high polling rate mouse which could drop FPS by 100". It is a performance problem but not necessarily because there is some particularly slow piece of code.
About the D3D11 “particular use of memory” case on AMD GPU, I think I know what happened there. Here’s a quote from Microsoft’s documentation: “When you pass D3D11_MAP_WRITE, D3D11_MAP_WRITE_DISCARD, or D3D11_MAP_WRITE_NO_OVERWRITE to the MapType parameter, you must ensure that your app does not read the subresource data to which the pData member of D3D11_MAPPED_SUBRESOURCE points because doing so can cause a significant performance penalty”
Engine can be completely fine, like UE, but game logic on top of it is complete clusterfuck built of antipatterns as mvp with "TODO: fixme" comment which never has been touched again because dev is busy making something else
I have been playing the demo for this game. I has been great. 144 locked on my PC and 45 locked on my OLED Steam Deck. Seeing that the port was handled so well just make me want to buy it more when it comes out on the 25th.
Thanks for the link and video. This is fascinating just from what goes into game development. And personally for me becuse I pay Falcom games so it's cool to get insight on how PC optimization went for this specific game.
Daniel, I'm sure you know this but when discussing how to track CPU bottlenecks you can use the "Max CPU/Thread Usage [%]" sensor in HWInfo64, which will show you how often any one of the cores hits high 90 - 100%. No doubt, when a specific core is at that level of busy, the next Frame will be held up.
The Falcom ports by Durante/PH3 have been nothing short of excellent, I even remember one of the Trails games I played somehow had a quick resume feature on PC. Now that I've been playing Metaphor with it's frame rate all over the place I really want Atlus to work with them too lol
Awesome , LOVED This insight , Really Appreciate it 👍 I hope we can have something like this on how GPU optimization works accross different gpu architectures
To me optimization is not just about improving the code and rendering pipelines, it's about the core game design. If a game is heavily CPU limited on a 7800X3D (the fastest gaming CPU), it means that the game was not designed with currently available hardware in mind. And this is the biggest problem with many newest titles, especially on Unreal Engine. And I would say the same for resolution. If a game has to upscale from 720p on consoles, the core design is flawed. No amount of optimization would move it up to acceptable image quality.
It certainly isn't intuitive and it's no surprise. This is an entertainment medium after all. Most people just want to relax and play games, not deal with this complex stuff.
for the portion on the "Profiler" this is really a tool that anyone can boot up on their system (Unreal Editor comes with a pretty good profiler in the install and it can be hooked into any application to profile it not just Unreal Engine executables) it shows you what function calls and instructions are being run within the sampling window. Without the "symbols" you might just get the op-code, but it will show like number of "calls to Draw", or "calls to comparison", and you can even get the raw data dump for that sampling window though that can be less helpful especially without the debug-symbols.
If you're interested in more optimization stories and especially more technical details, I highly recommend checking out Factorio's new posts. They have been posting once a week for a long time - while some of the posts are about new content, there's a lot of posts talking about how they optimize stuff, how to updating millions of entities every frame/couple of frames. And considering how big your base can become, it's mindblowing that it still runs well with that scale.
I am also hearing a design issue here. If the actors are sequentially coded from the start - nobody designed the sequence with parralellization in mind. For example: if a character walks through snow, it spawns footprints. Input processing, change char position, play animation, spawn footprint by animation result. You cant work on the footprints before finishing the animation thread. But you can work on multiple char animations of different characters - or calculate footprints for the 8 feet on the floor. If you wanna paralellize that, can pull a few tricks, that decouples parts or all of the animation thread, to spawn footprints. But that requires foreward thinking.
Any news about battlemage? im really looking forward to it Also This was an interesting take on why cpus arent fully utilized I thought it would be just "portion out the code to different cpus and merge them at the end" but turns out its more in depth then that
@@geoffreystraw5268 as long as the price is more affordable then a 4070, I'm willing to switch to team Blue, we need Intel to shake up gpu sales, they need to bring better price/performance cards so that the competitors lower prices to reasonable spots
It's an egregiously oversimplified word that roughly means it runs at MY preferred FPS at MY specific settings on MY system. Example: "Modern medicine is so unoptimized! Why can't I just take a panacea pill and be cured of all ailments?"
I understand a lot of people ask for impossibles but you gotta admit that some games have requirements that just don't make sense considering their visuals. Monsters hunter asking for a 6700xt or a 4060/ti to just run at 1080p, 60fps @medium using upscaling and framegen is ridiculous. That can't be classified as nothing but poor optimization.
@@enmanuel1950 that can be classified as nothing but corpo management heard about that new "upscale/FG magic" and decided to skip optimization steps of development altogether. "Why bother when we can save a lot of money by just using both upscale and FG as a default req?" Because fcuk gamers, corpo management only wants to gain maximum profit
pretty much, people are so obsessed with graphics presets these days that even if a new game at low settings looks better than the old game at ultra, people would still cry and scream that the game runs awful and its a joke how they have to run it at low settings to get the performance they want
In my current software architecture class our processor is specifically telling us to *not* parallelize our code 😅 Often times, unless you really know what you’re doing, parallel code runs *slower* than sequential code. It may seem counterintuitive, but just because a program is using all your cores doesn’t mean it’s faster than one just using one core. You need to do a ton of work splitting and coordinating your tasks properly so that multiple cores actually improve performance. It’s the same as cooking by yourself vs. coordinating a restaurant to make food.
Regarding the CPU bottleneck mentioned at 9:24, you can get an idea of what's going on in the game with Process Manager. You can look at how many threads the game spawned and how many cycles have been spent running those threads. Often times you'll see a majority of cycles the game runs is spent in a handful of threads despite spawning well over a few dozen. It's probably also a good indicator of how many cores/threads you can add before the game stops performing better on the CPU side.
If a game doesn't constantly stutter, hitch, lag, or compile shaders during gameplay with an RTX 4090 at 1440p resolution, then I would consider it optimized.
To be fair. Most games used to compile shaders at runtime and back when the games were smaller that wasn't a problem because materials had far less shaders required to compile.
@@TheMichaelparis Where they that bad? Im not interesting into playing them so im not familiar with how thay run outside some benchmark videos when i was choosing my new card(all for high end cards). But i remember when calisto came out(i was still running my 3060) and people where getting 20fps sections at 1440 with a 4080 and it literally killed any interest i had in the game because i couldn't stop thinking how much i would need to drop settings to get costant 60s.
Parallelizing CPU code is always difficult, but particularly so if you don't bake it in from the outset. Game design, game APIs and game programming all have a very long history of sequential operation, so there is a lot of stickiness in both mind-set and available tools and ideas. Some developers/studios have managed the transition better than others, but it also depends on what sorts of computation the game actually needs. If a particular requirement is sufficiently parallel, it probably ends up being calculated on the GPU, since in general everything can be run on either CPU or GPU. Ideally, any software running on arbitrary CPU/GPU combinations ( like, say, a random PC ) would profile the hardware capabilities and adjust the code architecture to best fit what is available; but this is shockingly hard to do.
Love the place he landed at. The technical blogs are actually really interesting and insightful, glad he's working on the ports because the quality shows. He's even got enough independence to push for pet features like local co-op on his own time. I was gonna wait on Ys X for a few months, still probably a good idea. But the game should actually be stable and perform out the gate with his meticulousness and track record. Sad contrast to me trusting Capcom with Dragon's Dogma 2 and assuming they should be good after playing monster hunter.
That link to that mouse polling rate article made me remember that early look / beta of a multiplayer game that is out and about now. In that early look, having a higher mouse polling rate resulted in the game stuttering like crazy when you were moving your camera around, in a 3rd person action shooter. I don't know how or why you would put a preview/demo/whatever out with this glaring issue present. My mouse was using a 500/s rate, which isn't even that crazy, but that already led to insane stutters so I reduced it to 125/s, which was better, but still not good. Did nobody ever try to play this with keyboard and mouse before releasing that? The experience successfully made me very wary of the game and ultimately led to me not picking the game up on release.
@@Winnetou17 I don't think either of us are qualified to say one way or the other. That's what I hate most about people on reddit whinging about unoptimized games, they don't even know what they're talking about. They're just complaining that [insert brand new game] doesn't run well on their years old mid range gaming PC, and "unoptimized" is the buzzword du jour to use. I don't think people complained nearly as much about Crysis in 2007 as people do now, it was just hard to run.
@@Eidolon2003 Much of the talk about Crysis was in reference to the "Can it Run Crysis" Epic/Ultra/Cinematic graphics setting for benchmarking, which no hardware on the market could run at a solid 30 FPS. Crysis needed a good CPU, and a lot of players were probably running the game on the "Low" graphics preset, but back then we still believed in Moore's Law getting us more powerful systems that could render Crysis' requirements a joke eventually. Now, in 2024, Moore's Law appears to be dead and is not expected to save us from poor optimization anymore. Video games are arbitrary in their content (until release), and so there is no reason for them to not run well. Therefore, video games that do not run well are defacto "unoptimized".
80% of the time when a modder “fixes” a game in 1 day after release, it means that someone on the dev team either had the same idea and couldn’t properly test and present it to the rest to the team in time before finishing the project and being reassigned to something else, or they did, but there were unintended consequences that forced them to revert the change (sometimes those don’t have anything to do with actual user experience, sometimes they do). 20% is just low hanging fruit that doesn’t fix anything outside of super rare outlier conditions like an ancient cpu/gpu combination no one on the dev team even thought about supporting because of how ancient it is.
While I am not a game dev I've played a lot around with code. One thing he skipped over and I think is worth bringing up is understanding how compilers work with different languages and how there are some differences between them when they translate the code from human readable to binary. Often the higher level code you use for example one that is very common in game dev is C# that relatively high level comparing to something that doom was written in anci-C. In unigine5 if you made a line cast with C# the compiler would compile instruction to run full vector calculation when in most cases if you were to write the code in anci-C like in doom or quake you could use so called "fast inverse square root" or more commonly known as quake algorithm.
They never did any of the steps discussed. That would be my explanation. Doesn't help that we now have folks making posts and articles recommending locking pc games to 30 fps to "get rid of traversal stutter".
Efficient parallel computing has and will always be a challenge. That's why CPUs with high single thread performance are usually better for gaming. There are only a few domain where parallelism gains is rather easy to achieve: when operation are always the same , with no interaction with 'neighbouring data’ and dataset is huge like in video encoding, rendering, and AI. It is exactly where GPUs and Tensor cores work... For the rest of the operations, there is a lot of interactions between many small datasets that make the game (like checking visibility, collisions, path planning, animation sequences, general game state update). This is all very complicated to optimize and parallelise.
I am not a game dev but I dipped my toes into high performance computing and simulations. It is very very very hard to get 100% CPU, even with pure math tasks. Some problems cannot be parallelised, some can be partially parallelised and some of them can run 100% parallel. But sometimes all your threads and processes have to wait for something else to finish. For example, you cannot have multiple threads writing on the same file or network connection without synchronicity (a.k.a. They have to wait on each other). Sometimes you have branches in your program that are not symmetrical and sometimes you get stuff that depend on random numbers . I can imagine that a videogame is even more complicated just because you also have to manage user input.
>always full of people flinging around the word "optimization" Imo, optimization is one of few words people on the internet actually use correctly most of the time. The article shown in this video even demonstrates that it is way more costly to make a decent port (weeks/months spent) instead of just bare minimum (merely days spent) which people are still going to buy. Because of that game companies (who are are just interested in maximizing profit) obviously don't invest into optimization. For the same reason they often include DRMs in their games which tank performance even further Incompetent devs using same old stock Unreal Engine which has tons of options, not all of which are suitable for real-time rendering do not help either
Basically what I got from this "It takes a lot of time to optimise games, so publishers shouldn't rush us to release the game by a certain date" Sounds about right, I have been saying for years that is the problem.
I think devs also stuck to using DX11 for a long time because they were happy to leave most of cpu to graphics related optimisations to the driver, which would do things like parallelise draw calls. Where as DX12 and Vulkan require the dev to do that work themselves, which can mean even better performance, but a lot of the time we are seeing the opposite, and was very common thing to see when their was an option for DX11 option it would just end up faster..
Thats what i do, frame cap to get a smooth fps line. So often I'm running near the 1% lows in order to never have a noticeable drop in frames no matter what or how much is going on. What's the point of chasing potential max fps, if the graph is a jittery mess.
It's wild how game optimization has evolved. Back in the C64 days, devs were spending like 90% of their time just optimizing because the hardware was so limited (C64 had only 64KB of RAM!). Every byte counted, so optimization wasn’t just part of the job-it was the job. By the time we got to the early 3D era, hardware got better, and the need for optimization dropped to around 70% (think early PlayStation and N64 games). But even then, they had to be really careful with 3D environments and frame rates. Nowadays, optimization effort is down to 30%, thanks to tools like Unreal Engine and powerful GPUs (you’ve got engines doing a lot of the work for you). That shift has let devs focus on game complexity instead.
The real story is that a ton of people on youtube claim "bad optimization" on everything that doesn't hit their internal idea's for how a game should preform. These people have no idea how it actually work but act like they understand completely. This is the real problem.
Thanks for the Explanation has gave me plenty of thought on the next PC Build been on 1080p 144hz for a long time but also many games still 60hz which is fine has gotten me interested in stopping at 180hz for shooters and anything else below 120hz, I use Vsync + Freesync at the same time for lower latency and also UA-cam only goes up to 60fps for videos so it seems the Industry for a lot of things haven't moved past 30/60 for a long time now well some games have reached 120 on consoles but their always gonna be the big limit factor regardless of W/E pc parts ya get. so might consider getting a 7700X over a 9800X3D and still wanna get the Highest RDNA4 Card. edit: why a 7700X as a consideration, because it would be cheaper and since their both AM5 could always upgrade to the last AM5 8core X3D Chip to come out so there's that. also recently got a 1440p 180hz IPS Monitor so looking forward to running a 2 PC setup with it early next year.
Also Nihon Falcom don't make 'AAAA' games or even 'AAA' grade, they stick to what they do best, show not tell. They make anime games so they don't need serious hardware to run.
gee something done on every software project ever in the 90's when we had one core 33 MHz and we needed to handle 2MBit traffic. Tools are the biggest problem call trace data and timing we used to pay $150k per seat for tools.
When you bounce 10 instruments in 2 channels that is optimizing.When playing all the tracks together it uses way more cpu and memory latency.But after it is mixxed together it uses less.Very much the same with graphics in many ways.
In conclusion, they put in a lot of work trying to optimize the game by utilizing more threads and they eventually saw massive improvements. So what you're saying is - if these huge companies with thousands of employees put more resources into working these issues out and getting more threads involved, the game would be more optimized. Which is essentially admitting that the comments were justified
this was a very interesting read and watch, I wonder what the process looks like In triple-A game studios, with the pressure from higher ups and etc...
I am the owner of rock life, when I first published it, I got like 100-120 fps on a 3080Ti. I knew this was trash because you stare at a rock and the grass sways around. I spent over a month optimizing with gpt and help from teachers and got closer to 250fps. I’ve learned more, redid some stuff and it’s closer to 300 but I’ve disabled the unlimited option and put the game to 120 for everyone because I got people complaining that their gpu was at 100% in the reviews.
11:30, Daniel is dead on, if a game dev has a problem that isn't complex/hard to multi thread they will. See all of computer graphics..... 16:44 Thanks for saying this Daniel, multi threading stuff is hard, and doing it afterwards is even harder. But the way production of games works this will often be the case. Make it work so we know if we even made the right game, then optimize it! 17:40 FPS is actually a bad way to measure game performance, 1 FPS worth of movement between 1fps and 2fps is a very different story than 1 fps between even 16 and 17 fps. Time per frame is much better. But players understand FPS, so that's what ends up on the graphs. The tools (profilers) that game programmers use to optimize games output time instead of FPS. If you want 30 fps you have 33.3ms to render the entire frame, 60fps 16.6ms, 120fps 8.3ms. Code takes time to run, so I want to know that. Also fun comment, so everyone talks about cpu vs gpu bottlenecks... Factorio players found out better RAM sticks(same capacity better timings, aka latency) could improve game simulation times. Most games this won't happen but it does for some. Also Optimization, easiest way to explain it... How efficiently are the cpu and gpu being used?
The last mile is the hardest! Optimising is the last mile (or quarter mile even) and doesnt always offer the best bang for the buck, in that, those devs could be developing other features or products with a higher ROI (return on investment). You have to draw the line somewhere and larger companies will naturally draw the line lower than dedicated PC enthusiasts.
Never thought I'd see or hear you talking about an Ys game! 😁 It's my favorite game series in the world and I'm waiting my physical copy of Ys X for my PS5 to arrive next week! (Hopefully next week because it's released on Friday and I never get anything that gets released on Friday the same week 😑) I buy most of my games digitally, but I collect physical copies of my favorites ones. 🙂 I checked the PC requirements for it and they really surprised me. According to steam I could easily play it 1440p 60fps on my current rig that's 4-5 years old 😅 But I'll play it on PS5 because I pre-ordered the game months ago. I may try it on PC later to support the guys!
That last step or two can take weeks or months and costs tens of thousands of dollars in some situations depending on how much staff you keep on so I don't fault devs most of the time. Its 100% a value proposition. That said, those staff should be kept on anyways whether or not a project finishes but thats a whole nother topic.
Hey, thanks for taking the time to discuss and explain my post!
Generally, what you said was all accurate, and it's great to see these things explained in detail for a more casual audience.
Any particularly favorite war stories from getting this working (or are all of the fun parts still under NDA)?
True, really great content!
This is sick thanks for the explanation... were you in game dev before DSfix, or was that after?
Hey, just curious, why do you only work on Falcom games?
there also the case when main thread access to cpu0 which is most bound for windows os, while the ps force the os to use only cpu7. some dev ported to pc but didnt disable cpu0 affinity and cause main thread to be slow cos it need to wait for os doing shit thing first.
Me buying a 144 hz monitor 10 yrs ago: wow imagine how smooth and clear games will look in the future with even higher refresh rate and faster gpus.
10 yrs later publishers still targeting 60 fps but now it's upscaled and interpolated fizzling blurry mess.
I've been a PC gamer and built my own PCs for 30 years. It has always been like this. As soon as hardware improves game developers want something that looks better and has more complicated game logic. What I don't get is why people seem so puzzled about it. You don't have to run at Ultra with RT Nightmare enabled. If you happen to prefer extreme fluidity over extreme visuals... just dial down the settings.
My XTX is 5.33x faster than my RX580 was. So I can chose to look at something nicer or I can choose to look at the same level of graphics 5.33x faster. Or I can chose something in between.
@@andersjjensenthe choice is the great thing about PC gaming but unfortunately human nature will rear its ugly head. Many people will psychologically feel like they aren’t getting there moneys worth if every setting isn’t slapped to “ultra” which includes RT.
@@BoYangTang You've been fooled by someone with no idea if you thought we'd get 120+ FPS optimization targets.
60 works great, if we aimed higher games would just end up looking dated.
@@andersjjensen I hear where you're coming from but I would say that it's not trivial to dial down the settings for higher framerates. e.g. going from high to medium for +10% fps, the best option after that is probably to lower render resolution. Also if a game is bottlenecked by a couple cpu threads all you can do is interpolation if its implemented.
Imo it's a difference in priorities I just hoped all those years that as higher refresh displays became mainstream games would be designed with better fluidity in mind because I felt the benefits were worthwhile.
some are even mindblowingly still targeting 30fps like psychopaths.
I think the biggest takeaway you can get from this video is “luckily the main developer had an AMD GPU”; explains a lot about the state of game development.
It makes you wonder how much more optimized games might be on average if they were developed on midrange hardware that matched their games' recommended specs. It's probably harder to see how unoptimized your game is and care if you're doing all of your work on a 4090. Not that I blame developers--I'd want the most beastly PC to develop games on, as well, because it'd be easier and faster--but there's something to be said for the creativity and ingenuity that comes from being forced to adapt and optimize as you go (ex. low budget movies that put every dollar to use versus bloated AAA productions that waste money because they can).
Certainly one of the takeaways of all time
@@Osprey850game development engines these type of development need a beast machine to even start to develop so i disagree entirely .
I tried to develop on my rx6600 for a collage project and it was pain in the ass
@@omargallo9636well yeah it’s a 6600xt lol 😂 you need the pro series RX to do all that , also they don’t use 4090s they use quadro if I remember correctly unless the name is different
@@omargallo9636 That's why I said that I would want a beast machine, too, if I were a game developer. I know that developing games on midrange computers isn't ideal, but developing on the highest end GPUs and CPUs probably isn't ideal for optimization, either.
I am a bit sad we didn't get the chopped baby analogy this time .... :(
💀💀
No babies were chopped in the making of this video.
Ran out of babies@@Osprey850
They found the coat hanger.
whats the reference im not caught up
The guy who wrote this post is a legend, thanks to him we had playable dark souls on pc for many years. Very nice insight in the post too.
Here's an example of a parallelism problem causing CPU bottleneck without CPU being 100% utilized:
- Let's say a particular work can be parallelized into tasks A, B and C.
- This mean A, B and C run on separate threads.
- The work completes only when all tasks A, B, C completes.
- Let's say A takes 1ms to complete, B: 2ms, C: 10ms.
- Now the problem: the work has to wait until the slowest task (C) completes *while the two other threads IDLES*.
- This is just an example but in reality there are many more tasks involved and works depend on one another.
- This creates a complicated web of things waiting for other things.
- Not to mention on PC, sometimes we find that a particular task takes way more time to complete only on a specific hardware configuration. This wreaks havoc to the finely-tuned task scheduling.
the easy way to read it is: if a single core of the CPU is stuck to 99-100% for more then one tenth a second that thing is CPU limited, and the same for the GPU. it could be the specific Tick/Update, or the specific combination to resolve application state. The leading contributor to the misunderstanding is that a lot of the frame-rate overlays only show "total CPU, and "total GPU". the GPU is more understandable, but even a 2-core/4-threadthe CPU should show 4 or 5 percentages not just 1.
This highlights the advantage of developing for consoles only.
Vanillaware).
Developers 20 years ago have done it with the nightmareish PS3 CELL CPU, PS2 Emotion Engine and its several cores and SEGA Saturn and its literally 8 separate processors, and achieved things unimaginable for the hardware. Y'all are just lazy.
No, you can't make this about "uhm but processors are different and stuff on PCs". Yeah, the brand changes, but unless it is a processor from 15 years ago, it will always support the same features, it will run on the same instruction set and it will work in the same way as another processor, just faster or slower. An FX 9590 works the same as an i7 4790K, but the i7 is simply FASTER. Nothing else changes. If you develop something with FX 9590 in mind, but then try to make it work on a 4790K, it will work right off the bat, just FASTER. And if you try to make it run on a slower processor with the same features? It will run slower. Incredible! Now, when you go way back in time and actually start lacking features (For example the decrepit MMX), THEN you run into problems.
That said, stop being lazy. I know you are underpaid by your bosses and they give you impossible deadlines, but you have the most power. If you are sick of how unjustly you developers are treated, QUIT. Make sure that your loss is FELT.
@@TheRealNightShot it isn't just "being lazy" if I need to go and calculate a number that requires these 5 other non-cached values, and I need that number NOW, then I can't async out those calculations. highly threaded workloads need to be designed specifically for that, and it turns out that many games are less optimizable, even the Dev in the story talked about said 'only some specific aspects could be threaded' Cinabench is highly threaded, but this is one frame being rendered on hardware that was never optimized for rendering. Games tend to have a "world thread" which cannot be asynced, and it turns out when these are heavily asynced you get race-conditions, and/or just incorrect results.
@ Again, it’s not unheard of, it has been done. If an approach doesn’t work, you make a new one. It’s been 40 years and developers still linger on the same fundamental approach to game code, which was meant for single threaded machines. Maybe it’s about the time to move on. To have such advanced and expensive hardware and no ability to use it. Games perform pitifully, and therefore It’s no longer possible to dwell on “this can’t become multi threaded”. Something must be done. Yes, it requires money, it requires time, but that’s what game development should be about. Have you seen how long GTA VI is taking? Have you got any idea how much money they’re spending on it? How much effort? And yet it will all be worth it in the end. I am read to bet that a GTX 1050 Ti and i5 6400 will be able to run that game.
Durante was indeed the guy who made DS Fix that turned the og DS1 PC port into something playable.
He eventually work consulting on PC ports by Xseed for an old JRPG focused company called Nihon Falcom and then founded his own company with friends, which is now primarily porting JRPGs to PC for NIS America (mainly still games by Falcom, which develops the Ys and Trails series', both of which Durante id also a fan of).
I know a few devs and it's a little hard to point the blame at them for poorly optimized games. They don't want their game to run like crap with stuttering, crashes and overblown hardware requirements. They have budgets, timeframes, resource limitations etc that all come down from the 'higher ups'. They are essentially forced to release their games before they are totally happy with it due to publishers, investors etc having the final say. There is a bunch of tick boxes to get through before a game is released, and as long as the right amount of boxes are ticked, that's all that matters. The other half of it, is inexperienced devs that don't understand the engine and tools properly. They still don't want to release a bad game and still have the push from above, but lack of experience, and publishers and investors once again only caring if the minimum required number of boxes are checked can only lead to a terrible experience for the gamer. The worst part is that these inexperienced devs think this is normal to release games in these unoptimized states, which means they will never get better at their craft.
I disagree with the last part .
Programmers always learn from past and always improves
@@omargallo9636good programmers do. I've worked with people that refused to learn and grow 😢
As a software engineer, I agree. This happens all the time, and I genuinely want to develop something flawless. However the high management pushes to deliver the way it is, and that's usually the commercial and marketing department's fault. They sell dreams to customers, take their big bonuses, and all the hard work falls on our shoulders.
Shit construct.
The publisher/higher up don't want to lose money with long development time.
The dev just does what higher up want, but also doesn't want to improve the game more than needed.
Both playing checkboxes and blame shifting if there's something bad going to happen.
Both neglecting the consumer. The dev have power to negotiate at risk of losing jobs, while publisher have power to make whatever they want at risk market not buy it.
This blaming game already gone to its due, not only creating toxic environment where the cancer CANT be stopped but also people moving away from Covid-impact(in my place there's a ton of people selling PC SETUP second hand, probably 1/3 even 1/4 of its price, most of them come from 2020-2022). That's why many people praise Baldur Gate 3 devs, heck even PlayStation already try to hold Palworld hands to PS5 despite Nintendo monopoly. If dev really don't have enough courage to stand for the consumer, it's already fall into the PR game. More often than not, dev best interest is always on consumer. But what can we do, no one want to risk losing jobs right ? Don't worry, market will fix it itself, even with AI and Indie Game rule the tools.
Well, I do feel it's related to the construct of some 'debt' country. But the market already shrinkin way before anyone can anticipate.
It certainly doesn't help when western game studios lay off thousands of talented developers, many of whom will choose to leave the games industry entirely and take their knowledge with them.
Brain drain has a huge effect on future projects.
The actual decision makers at AAA’s who don’t actually play games,
“if console targets 30/60 fps why spend budget on optimizing for higher fps on pc, they are getting the same experience”
Bro that would only be true if the consoles could even hit those frame rates consistently let alone without upscaling
@@yourlocalhuman3526upscaling technologies is the main reason why optimization sucks ass these days,
@@PrimeYT10 nope, that's still not the main reason. The main reason has always been investors, managers, executives who all keep imposing deadlines on unfinished products. Funny how we keep forgetting that because we never learned the true reason about horse armor DLC.
@@BleedForTheWorld yes, and upscaling being the excuse to skip testing and optimizing the games “if it can run on 30fps on console, deliver it to pc and give them dlss and frame gen”,
@@PrimeYT10False it sucked before this.
I am not a game dev anymore, but I was looking quite deeply into game development during my university days and also worked as a game dev for a little while...
The challenging part is, exactly as stated and as you understood correctly, that most of the stuff is going on on the main game thread and this one needs to be synchronous since it needs to be deterministic what happens after a certain interaction or input and is usually bound to some sort of update loop that is called once per frame. Engines usually already use separate threads for graphics and audio. From here on as the actual developer of the game, there are certain aspects that can be threaded. Such things are doing load tasks in the background or calcucating AI paths for example. But to stick with the pathing example, the only thing that can easily be threaded is the path calculation. Performing the movement along the calculated path is usually done on the main thread. Threading is challenging due to the nature of asynchronous programming. As soon as we use another thread to execute some logic, we cannot know when exactly that calculation is finished. The threaded calculation can report back to the main thread once it is done, but the main thread only knows that it receives the calculation results at some point in the future.
This is where something like an ECS comes in. And if your logic is too complex for an ECS, just perform it every few frames and let the ECS simulate the momentum of the game objects.
Hey, I work on optimization and would like to add some context to parallelization in video games. When most mainstream game engines were made, CPU core counts were very low - so back then it was totally okay or even optimal to have your game have a dedicated "game" thread for simulating all gameplay code and a "rendering" thread for rendering objects then a few extra threads for background work like loading assets. Nowadays most game engines still have a similar architecture with a dedicated game thread and render thread with maybe a few extra dedicated threads for audio etc. The problem is this doesn't scale well with core counts and your CPU bottleneck will always be your slowest task which, in this model, is inherently single-threaded and is usually either gameplay or rendering code so the majority of the cores on your CPU will be idle most of the time. To fix this a lot of engines have implemented the "fork/join" paradigm which is probably the simplest form of task-based parallelism. The way it works is you take an expensive task like animation and instead of running it on the game thread you spawn a "task" that will run on any of the available worker threads which runs in parallel to the game thread and await the result. This is good and increases parallelism but still won't result in total CPU core saturation and one of the dedicated threads will still likely be a bottleneck. This is where job-systems come into play. With a job system everything is a "task" and (for the most part) there is no concept of a "game thread" - everything runs in parallel against each other optimally on any available thread. Now that sounds good but getting this right is very difficult and ends up creating a lot of problems that tend to require new programming paradigms. If I have tasks A and B, how do I know if it's okay to run those tasks in parallel against each other? What if task A destroys an entity that task B is reading from? Resolving these task dependencies manually is the source of a lot of esoteric bugs and non-determinism in video games so that is why the general sentiment is only multithread if absolutely necessary. There are also very strange performance pitfalls with multi-threading that a lot of people are unaware of and can cause performance to get significantly worse! If you have a region of memory that represents a player character and one thread modifies the characters health - all other thread's CPU caches around that memory region will become invalidated, meaning if another thread goes and reads a nearby memory address to access the character's position for example, they will incur a big performance penalty and have to fetch that memory from L3 cache or RAM (this is an over-simplification since hardware differs). This is called false-sharing. So parallelism becomes very limited in this context, only one thread should be accessing an entity with this model. This is why new programming paradigms around multithreading are emerging such as the Entity Component System (Unity DOTS, Unreal Mass Entity, Bevy, etc). Basically in ECS you explicitly tell the job scheduler for your specific job what resources you are modifying / reading and can automatically run tasks in parallel for full CPU saturation (only if the jobs don't have too much contention over resources or scheduling conflicts). ECS usually also avoids false-sharing because of how it lays out entities in memory. So the question is, is it really a game engine's "lack of optimization" that causes poor CPU utilization? In my opinion, yes (depending on the engine *cough cough Unreal*). However, developers should still do more to try and parallelize their games, but it often becomes extremely difficult or the performance benefits are just not that significant due to the engine's architecture. Trying to write multi-threaded code on top of single-threaded code is no simple task.
As a developer, it is widely known that 90% of coding takes 10% of time, figuratively. Worse than that for large projects.
90% of a project takes 90% of the time. The remaining 10% takes another 90% of the time.
@@ZeroUm_ So you're saying the whole project takes 180% of the time. Every project just whizzes past it's deadlines then? Nah. I'spose you typoed that first 10%.
@@BaalaaxaIt's a joke about how project planning underestimates time taken to develop and test. Project managers have an incentive to underestimate resources required to please the bean counters.
@@BaalaaxaFixing the bugs takes 3rd 90% of the time
yeah I saw this making flash games back in the day, like half the battle is just learning how to make the thing the correct way and spinning wheels until then but once you're running on all cylinders it's crazy the amount of work you can get done and how much higher quality it is. I think general limitations are good for this, it forces people to slow down and learn/do things correctly instead of spending more time spinning wheels or making "cool" things that inevitably need to be remade or scrapped. I think a lot of people fall into making gimmicky stuff because of this rather than learning the codebase properly, which I totally get but it's always the boring ass lame projects I learned the most from and the most burnt out I've been has been making something work that wasn't done properly or that I didn't fully understand.
I fucking love ys and legend of heroes series. Durante ports are so good.
Same! 😁
back in 2013 I requested a feature for HwInfo64 which was implemented and rolled out in v4.14 called "max cpu/thread usage" -- this reports the % usage for the highest used thread in the CPU. this is a great indicator of CPU bottleneck: typically if I see that the usage was above 90% in a 1 second poll period, the GPU usage was also less than 100%. where it helps is that you can see how close to that point you are (looking at GPU usage by itself merely tells if you're bottlenecked or not). may be worth adding that to your rivatuner OSD config.
In 7.63 GPU busy was added and it is a far better indicator of which component (cpu/gpu) is waiting for the other.
I have only one finished simple VR game under my belt (on SideQuest).
For me, it was the number of draw calls which the CPU was passing to the GPU. So, CPU bottleneck.
I had to integrate GPU instancing to eliminate that bottleneck.
Was very interesting experience
Interesting indeed
@@jose131991 GPU instancing is really, really cool! You pass one game object from the CPU to the GPU and tell the GPU to render it 1000 (or more) times in any number of places that you want, with any size.
I did it on a Oculus Go hardware, so had very little to work with, yet I was able to procedurally create whole constellations (using Poisson disc sampling)
Ys 10 is made by Nihon Falcom, they are old school devs that REALLY are niche and knows what they are doing for gaming communities.
This Japanese company is OLDER than Square and Enix, in fact ex-Nihon Falcom employees from the OG generations left that company to form their own Squaresoft and Enix. Modern Nihon Falcom is maintained and organized by it's fans who are now CEO and staff members.
yes, the porting work for pc however is done by PH3 Games, which is a studio that the western publisher nisa hired for their porting works of falcom games. Its an interesting situation, because for other Asia regions Falcom partners with a different publisher, Clouded Leopard Entertainment, who create their own port work. In regards to YS X, the original PC release by Clouded Leopard Entertainment (Asian Language only) was so bad, that they had to take it off Steam and rework it for several months. they have now actually re-released the game and its in a much better condition. Doesn't really matter to much for us western fans, as again NISA and CLE cater to different markets. At this point in time, no PC ports have been done by Falcom themselves in a long time, and tbh i doubt they would go as far as Durante would with the extra QoL and game optimization stuff.
i doubt falcom has the necessary in-house expertise to competently make pc ports. most of the heavy lifting was subcontracted out to third parties like durante's ph3 games
@@aliasonarcotics The consoles are their most optimized versions, often time a bit sluggish cuz of weaker console ports (switch).
@@aliasonarcotics There was at least a few Ys titles developed for PC first (Felghana, 6 and Origin), so Falcom actually has quite a history developing for Windows. They just don't bother do in-house porting anymore since they need to focus on making new games every year.
The only contact I had with Nihon work was in Trails of Cold Steel 3 in..... Stadia.
And let me tell you, very optimized stuff there, given the fact that stadia was a running on Linux with vulkan, the port did work 100%, which is actually surprising given that most ports that hit stadia were pretty bad, obviously..
Altough the YS X PC port seems to be developed by another team.
Optimizing means naming whichever settings gets 120 fps on my machine as Maximum settings so my ego doesn't get hurt.
Haha yeah the word has lost most of its original meaning, from all the people complaining 😅
To me optimization means the amount of "visual output" / fps.
Take Witcher 3 version 1.32 vs ver. 4. In patch 1.32 (about 2016 to 2022) there was no ray tracing or higher-than-ultra settings options. But the game looked damn good. And it ran well.
Then along came ver. 4 and it upgraded the visuals. But the frame rate was cut to a third of previous versions. But did it look 3 times as great? To me it didn't. Which means I would say that ver 1.32 was more optimized than version 4.0. Even though 4.0 was nicer looking if you just compared screen shots.
The pit fall though is that depending on the users system, the performance impact can vary from user to user. Like the visual upgrade that ray tracing brings also has a higher performance hit on current AMD GPUs than Nvidia's. And I think some of the ambiguity of the word optimized sneaks in here. Where users might not agree if the visual upgrade is worth the hit to performance.
it is exactly that. always. people think their pc have infinite power and they used to think that 4, 8, 12, 20 years ago.
Or just runs well on regular hardware especially when you know it should. This to me is a shitty game making problem. How do i know that you say? Look at some games that look amazing and use all the same graphics technology as game that look amazing but run like crap.
@@kxddan02 Some games don't take a lot of resources. look at hellblade 2 most beautiful game but its linear corridors, no AI for NPCs, its all precanned animations or physics everything is prebaked. Then some games will have a lot of physicss objects and NPCs and will have to turn down graphically for your PC to accomodate.
Based
yes, durante made DSFix.
he also made the generic downsampling tool to render games at insane resolutions before nvidia's DSR came out.
impressive increase in this one 👍🏻
Once again here to request a look at the history of generational GPU price:performance improvement over the years 🙏
tech power up
there you go
bigger number better
It's been very consistent generation to generation. 10-20 fps each generation on the same game.
30 series got 15 more frames on 'example' game over the 20 series.
And 40 series got 15 more frames on the same game over 30 series.
Of course because the total frame quantity is higher the _percentage_ looks smaller. But the uplift has been consistently the same.
at this point, it might make sense for reviewers to not make this kind of content until the first quarter of next year after the 50 series gpus have launched. those kinds of videos can take quite a while to make if the reviewer/content creator is actually running fresh benchmarks for everything and not just pulling data from other publications - something any of us could do.
@@Ghost-pb4tsTPU data scales pretty poorly across gens sometimes. Quite a few GPUs are also scaled linearly based on theoretical performance (basically GPU clock speed times core count), which tends to disfavor lower power parts.
Hardware unboxed did that already not that long aog.
"A lot of development time went into debugging problems that arose to this parallelization."
That's really key. Not only can tasks be difficult to logically split up in the first place (see the uh... baby example) but after you do, you get completely new kinds of problems that non-parallelized code simply doesn't have, namely deadlocks and race conditions (and probably some others that I forget). What's even worse is that these are extremely hard to spot by looking at the code and compilers certainly won't catch them either. Then when you try to catch them during runtime using a debugger, using breakpoints or just the debugger itself can alter the timings so that the problems don't appear to happen (but re-appear when you don't use the debugger). Parallelization can be a questionable effort indeed.
I am not a game dev but I dipped my toes into high performance computing, software defined radios, and simulations. It is very very very hard to get 100% CPU, even with pure math tasks. Some problems cannot be parallelised, some can be partially parallelised and some of them can run 100% parallel. But sometimes all your threads and processes have to wait for something else to finish. For example, you cannot have multiple threads writing on the same file or network connection without synchronicity (a.k.a. They have to wait on each other). Sometimes you have branches in your program that are not symmetrical and sometimes you get stuff that depend on random numbers . I can imagine that a videogame is even more complicated just because you also have to manage user input.
Edit: I wanted to add the massive headache of allocating memory, freeing unused memory, and copy stuff in memory, which you always try to avoid but sometimes it’s inevitable.
i spent a week optimizing rain for my doom mod once. i started with physically accurate, individual rain drops and were affected by forces around them, spawned in random positions and were constantly active. i ended up with "clumps" of rain that had a fixed speed (no acceleration), only spawned when you were near and actively looking at them and in a fixed, pseudo-random-looking pattern. the final product looked better (much denser rain possible) and ran at least a hundred times better (i mean, it cost a hundredth of the original implementation to run. the game still ran fine on my pc with the original implementation, i was just optimizing for weaker hardware and scalability)
There should not have been any acceleration in the first place because rain should be coming down at terminal velocity anyway
This is a useful video. The word "optimisation" is getting thrown around way too much now. A lot of people misunderstand what optimisation actually means.
silent hill 2 remake have same problem stutterfest and devs are sleeping
@@fireballs7346 The fact that the 4090 can get around 50 FPS at native 4K max settings with RT in the less demanding areas is impressive. RT is notoriously demanding, considering multiple rays are being shot per pixel, and the GPU has to do millions and millions of calculations when the light bounces from one surface to another. I would only consider this badly optimised due to terrible traversal stutters. Not the actual performance. Silent Hill 2 is a large open world AAA game that has a bunch of high-resolution multi layered textures and geometry, which further makes this more demanding. Patches should be coming soon to address any frame spike issues. The horrible stuttering issues would be the reason why I consider Silent Hill 2 badly optimised.
Wait, what.. Are you saying the internet is full of random people with little to no actual relevant experience and knowledge, speaking as if they are industry experts? Shocking development...
@@Gamer-q7v to kind of be devils advocate, compare silent hill 2's rt performance to cyberpunk, wouldnt you say that silent hill 2 is poorly optimized considering how much less fps it gets despite using all the same ray tracing features?
Optimization is code for "I'm poor and I refuse to spend money on GPUs that will be 4 years old soon ie RX 6000 cards/ RTX 3000 series cards."
yeah this is best case scenario unfortunately a lot of high end devs don't bother with this kind of deep optimization, at least there are port studios that care about this stuff
It’s why Nixxes ports are so good tbh but the other half of that is that most AAA devs use unreal engine which is not great for granular recoding
More like this kinda thing takes time and investors don't allow you time. If the higher ups say the game needs to be released, you have to release it
In the 90ties I did a lot of coding and optimizing in Assembler on my (long gone) Amiga 1000 as a hobby. There was also some kind of parallelization going on with the custom chips, so I know what you are talking about also the OS had some quirks (documented and undocumented). I think you can get the most optimization by a good planning phase that means finding the fastest method of getting the desired result and already include parallelization (structure) in that phase. It will save you a lot of dev time. I disassmbled some graphics demos, learned methods and expanded from there. Code optimization can be fun and I ended up cleaning up highly frequented loops or getting rid of loops completely (like a number sort that turns out to be only dealing with 10 elements all the time, just write it out, don't call up a lengthy quicksort etc), converting floatpoint math to integer, creating lookup tables with precalculated values etc... I guess today coding is on a way different level than the stoneage stuff I wrote about.
Very interesting. Console optimization will always be far simpler than PC optimization
You’re missing one crucial point here: yes, you can solve all of that with clever planning and knowing your hardware if you’re the sole developer who’s responsible for graphics code, gameplay code, and art all at the same time; with modern teams consisting of hundreds of people, it’s simply not possible to anticipate every possible thing a team member on the other side of the globe is going to try to do, and a huge amount of issues inevitably goes to the quality control side of things where you’re incredibly limited in what you can do to address issues.
@@sasha_chudesnov That is why I said that it is on a different level today however spreading the dev task around the world is a major fault IMHO and must lead to problems and I wonder if the initial cost saving really outweighs the additional debugging time.
my life in embedded systems. great video.
Great video! Thanks for showcasing this article, it is enlightening. As a developer - but not a game developer - it's an interesting look at this process from another angle. And It does a fantastic job of disproving the idea that
What I hate with these types of comments is how many people blame the devs
No dev wants a game to be unoptimzied, but publishers often view optimization as "a waste of time" (and from an economic standpoint it does tend to be) but devs would, if given the time, optimize till the game runs on a potato, but publishers just don't give that time
Also, bad devs and not knowing the tools well also affect performance.
That's so true
@@AntiTako FinalFantasy 14 version 1.0 is a good example
Ok
how do modders fix and impelent so many quality-of-life stuff in less than a week
Without access to the source code
Doesnt that raise any red flags to you
The developers shouldn't have sold out to the publishers in the first place then if they wanted to keep control of the dev process.
I really enjoyed the demo for this game. It's running great. I'll buy it on Day 1. I also noticed the developers posting about game optimization which I regularly see on social media (for example Reddit). I hope some other technical focus UA-camrs take a look at this as well.
Durante is a god among men. You know the PC port is gonna be good whenever he's involved. He actually still seems passionate about his work even after all these years.
I wonder if Daniel would be willing to dive a little deeper on this channel technically. I would love to see the basics explained, on what happens on the screen vs what happens on the cpu / gpu level.
It's not an easy subject to simplify. Look up Acerola.
I try... but I'm not saying what people want to hear. So a 47 year old software engineer can go cram 34 years of programming experience up his ass because people want to be angry that the latest games at max settings stress the latest hardware available... like they have for the last 30 years.
@@andersjjensenwho are you exactly?
@@jose131991 Nobody per se. Just a regular commenter who often elaborate or expand on what Daniel says. He's a math teacher who has a reasonable grasp on the fundamentals, and it shows. But it also shows that he doesn't understand operating system and coding theory much deeper than just conversational level. Not that he ever pretends anything else.
Hey daniel, nice to see you today
Hi
Optimal parallellisation is engine architecture problem. Its OOP inheritance vs agegration vs functional and data oriented . For optimal MT parallel compute you need to implement it at core of your engine. If then the game does not scale then the game problem is to light to split up. This is also genre and scope dependant. The first step is using DX12 or Vulkan to have multithreaded render subsystem where you feed gpu parallel. DX11 supports MT but not efficient.
To me DX11 is red flag as render is single threaded feed it MT it doesn’t scale. If scale up to 3 threads it depend is it to smal scope game is inheritance oop. The game that scale best are those with have large amount of entities this is where you can go very wide. If your engine could handle different genres also RTS and if those scale well with MT then your engine is MT optimised. That means games where there ar little objects won’t scale with MT even if it higly optimised. But then are often light in load. Unless these are complex and graphics heayvy objects.
I would avoid globals and singletons as what put wrench in parallel compute is not avoiding as much shared data. Also opt for function without side effect. Keep as much compute independat.
It would be interesting what triple A dev do. But they use inhouse often shared game engine. Wich need to be highly data driven and support any genre . There is online ubisoft convention speak how the implement engine from large overview.
This is such a great video. I often faced the same problem where people would say "this game is no optimized", but what if the game simply has too much to process? I think this has become somewhat of a gamer term where bad performance means something is not optimized. This video is very insightful for the general public
Perfect pronunciation of Ys too btw. Optimization is an interesting iceburg.
The fact that The Finals can run how it does even with insane destruction and full ray tracing tells me UE5 isn't even a problem. It's clear that all these other UE5 games are poorly optimized rather than being engine limitations.
i tried The Finals first week of lunch, was pretty unplayable at that point, so i am giving it some time, remember how overwatch was shit for a year also.
are you saying The Finals has improved a lot since then?
the finals only uses RTGI what is far less intensive than "fullRT" and also just plain looks allot worse than regular RT since its still just using probes like normal lighting but updating them every few seconds with RT instead of them just being prebaked
Yeah UE5 has a lot of issues in that it optimizes inefficiently that then relies on upscaling/TAA to fix rendering issues and improve performance.
GSC developers of stalkers 2 have said that optimization is possible on UE5, it's just that it's very time-consuming.
You didn't watch the video did you?
Check this out - Unreal 5.5 on PS5 Standard. They talk about optimisation with the hardware and a different calculation of RT.
Unreal 5.4 and 5.5 both had focus on reducing stutter
You should cover the article mentioned there about mouse polling. It shows that one of the "poorly optimized" case is actually "dev did not account for player using a high polling rate mouse which could drop FPS by 100". It is a performance problem but not necessarily because there is some particularly slow piece of code.
About the D3D11 “particular use of memory” case on AMD GPU, I think I know what happened there. Here’s a quote from Microsoft’s documentation: “When you pass D3D11_MAP_WRITE, D3D11_MAP_WRITE_DISCARD, or D3D11_MAP_WRITE_NO_OVERWRITE to the MapType parameter, you must ensure that your app does not read the subresource data to which the pData member of D3D11_MAPPED_SUBRESOURCE points because doing so can cause a significant performance penalty”
YS MENTIONED
I KNOW! 🎉
Almost no one using this term actually understands what they are talking about, I lump Optimisation in with terms such as "Netcode" and "Engine"
Engine can be completely fine, like UE, but game logic on top of it is complete clusterfuck built of antipatterns as mvp with "TODO: fixme" comment which never has been touched again because dev is busy making something else
I have been playing the demo for this game. I has been great. 144 locked on my PC and 45 locked on my OLED Steam Deck. Seeing that the port was handled so well just make me want to buy it more when it comes out on the 25th.
Thanks for the link and video. This is fascinating just from what goes into game development. And personally for me becuse I pay Falcom games so it's cool to get insight on how PC optimization went for this specific game.
Daniel, I'm sure you know this but when discussing how to track CPU bottlenecks you can use the "Max CPU/Thread Usage [%]" sensor in HWInfo64, which will show you how often any one of the cores hits high 90 - 100%. No doubt, when a specific core is at that level of busy, the next Frame will be held up.
This was a very cool video! Kinda wish pc devs would go more often into these kinds of details. It's helpful for devs and customers alike after all!
The Falcom ports by Durante/PH3 have been nothing short of excellent, I even remember one of the Trails games I played somehow had a quick resume feature on PC. Now that I've been playing Metaphor with it's frame rate all over the place I really want Atlus to work with them too lol
Awesome , LOVED This insight , Really Appreciate it 👍
I hope we can have something like this on how GPU optimization works accross different gpu architectures
To me optimization is not just about improving the code and rendering pipelines, it's about the core game design. If a game is heavily CPU limited on a 7800X3D (the fastest gaming CPU), it means that the game was not designed with currently available hardware in mind. And this is the biggest problem with many newest titles, especially on Unreal Engine.
And I would say the same for resolution. If a game has to upscale from 720p on consoles, the core design is flawed. No amount of optimization would move it up to acceptable image quality.
It hurts to see how many gamers do not understand this subject.
It certainly isn't intuitive and it's no surprise. This is an entertainment medium after all. Most people just want to relax and play games, not deal with this complex stuff.
@@BleedForTheWorld I love learning about this stuff personally. It's as relaxing as gaming to me.
@@christophermullins7163 yeah we're the minority for sure
for the portion on the "Profiler" this is really a tool that anyone can boot up on their system (Unreal Editor comes with a pretty good profiler in the install and it can be hooked into any application to profile it not just Unreal Engine executables) it shows you what function calls and instructions are being run within the sampling window. Without the "symbols" you might just get the op-code, but it will show like number of "calls to Draw", or "calls to comparison", and you can even get the raw data dump for that sampling window though that can be less helpful especially without the debug-symbols.
awesome explanition for these kind of CPU problems
Love those kinds of videos. Ty
Do we know if the more popular engines (Unreal and Unity) make it easy for developers to optimize CPU usage based on parallelization?
If you're interested in more optimization stories and especially more technical details, I highly recommend checking out Factorio's new posts. They have been posting once a week for a long time - while some of the posts are about new content, there's a lot of posts talking about how they optimize stuff, how to updating millions of entities every frame/couple of frames. And considering how big your base can become, it's mindblowing that it still runs well with that scale.
I am also hearing a design issue here. If the actors are sequentially coded from the start - nobody designed the sequence with parralellization in mind.
For example: if a character walks through snow, it spawns footprints. Input processing, change char position, play animation, spawn footprint by animation result.
You cant work on the footprints before finishing the animation thread. But you can work on multiple char animations of different characters - or calculate footprints for the 8 feet on the floor.
If you wanna paralellize that, can pull a few tricks, that decouples parts or all of the animation thread, to spawn footprints.
But that requires foreward thinking.
Any news about battlemage? im really looking forward to it
Also This was an interesting take on why cpus arent fully utilized
I thought it would be just "portion out the code to different cpus and merge them at the end" but turns out its more in depth then that
On MLID he said the top card with be on par with a 4070.
@@geoffreystraw5268 as long as the price is more affordable then a 4070, I'm willing to switch to team Blue, we need Intel to shake up gpu sales, they need to bring better price/performance cards so that the competitors lower prices to reasonable spots
It's an egregiously oversimplified word that roughly means it runs at MY preferred FPS at MY specific settings on MY system. Example: "Modern medicine is so unoptimized! Why can't I just take a panacea pill and be cured of all ailments?"
I understand a lot of people ask for impossibles but you gotta admit that some games have requirements that just don't make sense considering their visuals. Monsters hunter asking for a 6700xt or a 4060/ti to just run at 1080p, 60fps @medium using upscaling and framegen is ridiculous. That can't be classified as nothing but poor optimization.
@@enmanuel1950 that can be classified as nothing but corpo management heard about that new "upscale/FG magic" and decided to skip optimization steps of development altogether.
"Why bother when we can save a lot of money by just using both upscale and FG as a default req?"
Because fcuk gamers, corpo management only wants to gain maximum profit
pretty much, people are so obsessed with graphics presets these days that even if a new game at low settings looks better than the old game at ultra, people would still cry and scream that the game runs awful and its a joke how they have to run it at low settings to get the performance they want
In my current software architecture class our processor is specifically telling us to *not* parallelize our code 😅
Often times, unless you really know what you’re doing, parallel code runs *slower* than sequential code. It may seem counterintuitive, but just because a program is using all your cores doesn’t mean it’s faster than one just using one core. You need to do a ton of work splitting and coordinating your tasks properly so that multiple cores actually improve performance.
It’s the same as cooking by yourself vs. coordinating a restaurant to make food.
Regarding the CPU bottleneck mentioned at 9:24, you can get an idea of what's going on in the game with Process Manager. You can look at how many threads the game spawned and how many cycles have been spent running those threads. Often times you'll see a majority of cycles the game runs is spent in a handful of threads despite spawning well over a few dozen. It's probably also a good indicator of how many cores/threads you can add before the game stops performing better on the CPU side.
Durante is amazing, his fixes/optimization mods are some of the best!
optimized is when not like Monster Hunter Wilds, where a 4060 apparently can't run the game at 1080p 60fps with no frame gen on mid graphics
If a game doesn't constantly stutter, hitch, lag, or compile shaders during gameplay with an RTX 4090 at 1440p resolution, then I would consider it optimized.
Yeahh calisto protocol set the bar really low
To be fair. Most games used to compile shaders at runtime and back when the games were smaller that wasn't a problem because materials had far less shaders required to compile.
@@fallencrow6718 jedi survivor and Hogwarts legacy just behind
at 1440p with a 4090? thats unoptimized as shit
@@TheMichaelparis Where they that bad? Im not interesting into playing them so im not familiar with how thay run outside some benchmark videos when i was choosing my new card(all for high end cards). But i remember when calisto came out(i was still running my 3060) and people where getting 20fps sections at 1440 with a 4080 and it literally killed any interest i had in the game because i couldn't stop thinking how much i would need to drop settings to get costant 60s.
Parallelizing CPU code is always difficult, but particularly so if you don't bake it in from the outset. Game design, game APIs and game programming all have a very long history of sequential operation, so there is a lot of stickiness in both mind-set and available tools and ideas.
Some developers/studios have managed the transition better than others, but it also depends on what sorts of computation the game actually needs. If a particular requirement is sufficiently parallel, it probably ends up being calculated on the GPU, since in general everything can be run on either CPU or GPU.
Ideally, any software running on arbitrary CPU/GPU combinations ( like, say, a random PC ) would profile the hardware capabilities and adjust the code architecture to best fit what is available; but this is shockingly hard to do.
Love the place he landed at. The technical blogs are actually really interesting and insightful, glad he's working on the ports because the quality shows. He's even got enough independence to push for pet features like local co-op on his own time. I was gonna wait on Ys X for a few months, still probably a good idea. But the game should actually be stable and perform out the gate with his meticulousness and track record. Sad contrast to me trusting Capcom with Dragon's Dogma 2 and assuming they should be good after playing monster hunter.
Wow learned a lot from the vid and the comments, thanks.
That link to that mouse polling rate article made me remember that early look / beta of a multiplayer game that is out and about now. In that early look, having a higher mouse polling rate resulted in the game stuttering like crazy when you were moving your camera around, in a 3rd person action shooter. I don't know how or why you would put a preview/demo/whatever out with this glaring issue present. My mouse was using a 500/s rate, which isn't even that crazy, but that already led to insane stutters so I reduced it to 125/s, which was better, but still not good. Did nobody ever try to play this with keyboard and mouse before releasing that? The experience successfully made me very wary of the game and ultimately led to me not picking the game up on release.
Does the one-frame-off synchronization introduce extra latency?
LOVE THIS!! I WANTED TO GOOGLE THIS BUT YOU ARE A STEP AHEAD!!
"unoptimised" is incorrect. "doesn't run" well is more accurate rather than claiming you know how the underlying code works.
You know that "optim / optimally" literally means "as best as possible" right ? Unoptimized 100% fits that description.
@@Winnetou17 Something can not run well, but still be running optimally. Some tasks just take a lot of processing power no matter what
@@Eidolon2003 Fair point. But in our case / context, I don't think something like this actually happens in practice.
@@Winnetou17 I don't think either of us are qualified to say one way or the other. That's what I hate most about people on reddit whinging about unoptimized games, they don't even know what they're talking about. They're just complaining that [insert brand new game] doesn't run well on their years old mid range gaming PC, and "unoptimized" is the buzzword du jour to use. I don't think people complained nearly as much about Crysis in 2007 as people do now, it was just hard to run.
@@Eidolon2003 Much of the talk about Crysis was in reference to the "Can it Run Crysis" Epic/Ultra/Cinematic graphics setting for benchmarking, which no hardware on the market could run at a solid 30 FPS. Crysis needed a good CPU, and a lot of players were probably running the game on the "Low" graphics preset, but back then we still believed in Moore's Law getting us more powerful systems that could render Crysis' requirements a joke eventually.
Now, in 2024, Moore's Law appears to be dead and is not expected to save us from poor optimization anymore. Video games are arbitrary in their content (until release), and so there is no reason for them to not run well. Therefore, video games that do not run well are defacto "unoptimized".
Appreciate all you do for the pc community Daniel you deserve way more subs !
i accept and understand the argument, but when a random modder can show up and fix a game the second day its out ...
80% of the time when a modder “fixes” a game in 1 day after release, it means that someone on the dev team either had the same idea and couldn’t properly test and present it to the rest to the team in time before finishing the project and being reassigned to something else, or they did, but there were unintended consequences that forced them to revert the change (sometimes those don’t have anything to do with actual user experience, sometimes they do).
20% is just low hanging fruit that doesn’t fix anything outside of super rare outlier conditions like an ancient cpu/gpu combination no one on the dev team even thought about supporting because of how ancient it is.
How can eastern european/indian implement this while working on game code from coding sweatshop?
While I am not a game dev I've played a lot around with code. One thing he skipped over and I think is worth bringing up is understanding how compilers work with different languages and how there are some differences between them when they translate the code from human readable to binary. Often the higher level code you use for example one that is very common in game dev is C# that relatively high level comparing to something that doom was written in anci-C. In unigine5 if you made a line cast with C# the compiler would compile instruction to run full vector calculation when in most cases if you were to write the code in anci-C like in doom or quake you could use so called "fast inverse square root" or more commonly known as quake algorithm.
Very good insight, thanks for the video
But how do we explain devs recommanding frame generation for 1080p60 now?
They never did any of the steps discussed. That would be my explanation. Doesn't help that we now have folks making posts and articles recommending locking pc games to 30 fps to "get rid of traversal stutter".
Efficient parallel computing has and will always be a challenge. That's why CPUs with high single thread performance are usually better for gaming. There are only a few domain where parallelism gains is rather easy to achieve: when operation are always the same , with no interaction with 'neighbouring data’ and dataset is huge like in video encoding, rendering, and AI. It is exactly where GPUs and Tensor cores work... For the rest of the operations, there is a lot of interactions between many small datasets that make the game (like checking visibility, collisions, path planning, animation sequences, general game state update). This is all very complicated to optimize and parallelise.
I am not a game dev but I dipped my toes into high performance computing and simulations. It is very very very hard to get 100% CPU, even with pure math tasks. Some problems cannot be parallelised, some can be partially parallelised and some of them can run 100% parallel. But sometimes all your threads and processes have to wait for something else to finish. For example, you cannot have multiple threads writing on the same file or network connection without synchronicity (a.k.a. They have to wait on each other). Sometimes you have branches in your program that are not symmetrical and sometimes you get stuff that depend on random numbers . I can imagine that a videogame is even more complicated just because you also have to manage user input.
One of the most interesting materials I have seen this week. Thanks.
>always full of people flinging around the word "optimization"
Imo, optimization is one of few words people on the internet actually use correctly most of the time.
The article shown in this video even demonstrates that it is way more costly to make a decent port (weeks/months spent) instead of just bare minimum (merely days spent) which people are still going to buy. Because of that game companies (who are are just interested in maximizing profit) obviously don't invest into optimization. For the same reason they often include DRMs in their games which tank performance even further
Incompetent devs using same old stock Unreal Engine which has tons of options, not all of which are suitable for real-time rendering do not help either
The takeaway is that Durante still is legendary, I'll be buying that game, I wasn't aware he was working porting games to PC, we need more Durantes
Basically what I got from this
"It takes a lot of time to optimise games, so publishers shouldn't rush us to release the game by a certain date"
Sounds about right, I have been saying for years that is the problem.
I think devs also stuck to using DX11 for a long time because they were happy to leave most of cpu to graphics related optimisations to the driver, which would do things like parallelise draw calls. Where as DX12 and Vulkan require the dev to do that work themselves, which can mean even better performance, but a lot of the time we are seeing the opposite, and was very common thing to see when their was an option for DX11 option it would just end up faster..
Thats what i do, frame cap to get a smooth fps line. So often I'm running near the 1% lows in order to never have a noticeable drop in frames no matter what or how much is going on. What's the point of chasing potential max fps, if the graph is a jittery mess.
I feel like most games get stuck at the "0.8" stage and don't go any further.
It's wild how game optimization has evolved. Back in the C64 days, devs were spending like 90% of their time just optimizing because the hardware was so limited (C64 had only 64KB of RAM!). Every byte counted, so optimization wasn’t just part of the job-it was the job. By the time we got to the early 3D era, hardware got better, and the need for optimization dropped to around 70% (think early PlayStation and N64 games). But even then, they had to be really careful with 3D environments and frame rates.
Nowadays, optimization effort is down to 30%, thanks to tools like Unreal Engine and powerful GPUs (you’ve got engines doing a lot of the work for you). That shift has let devs focus on game complexity instead.
The real story is that a ton of people on youtube claim "bad optimization" on everything that doesn't hit their internal idea's for how a game should preform.
These people have no idea how it actually work but act like they understand completely.
This is the real problem.
Thanks Daniel, this was such a good video
Thanks for the Explanation has gave me plenty of thought on the next PC Build been on 1080p 144hz for a long time but also many games still 60hz which is fine has gotten me interested in stopping at 180hz for shooters and anything else below 120hz, I use Vsync + Freesync at the same time for lower latency and also UA-cam only goes up to 60fps for videos so it seems the Industry for a lot of things haven't moved past 30/60 for a long time now well some games have reached 120 on consoles but their always gonna be the big limit factor regardless of W/E pc parts ya get.
so might consider getting a 7700X over a 9800X3D and still wanna get the Highest RDNA4 Card.
edit: why a 7700X as a consideration, because it would be cheaper and since their both AM5 could always upgrade to the last AM5 8core X3D Chip to come out so there's that. also recently got a 1440p 180hz IPS Monitor so looking forward to running a 2 PC setup with it early next year.
Is there a way to cap the framerate with AMD Fluid Motion enabled? Maybe cap the framerate in game to half my target framerate with FM2 engaged?
Also Nihon Falcom don't make 'AAAA' games or even 'AAA' grade, they stick to what they do best, show not tell.
They make anime games so they don't need serious hardware to run.
gee something done on every software project ever in the 90's when we had one core 33 MHz and we needed to handle 2MBit traffic.
Tools are the biggest problem call trace data and timing we used to pay $150k per seat for tools.
When you bounce 10 instruments in 2 channels that is optimizing.When playing all the tracks together it uses way more cpu and memory latency.But after it is mixxed together it uses less.Very much the same with graphics in many ways.
In conclusion, they put in a lot of work trying to optimize the game by utilizing more threads and they eventually saw massive improvements. So what you're saying is - if these huge companies with thousands of employees put more resources into working these issues out and getting more threads involved, the game would be more optimized. Which is essentially admitting that the comments were justified
this was a very interesting read and watch, I wonder what the process looks like In triple-A game studios, with the pressure from higher ups and etc...
Again, thanks for these Informations Daniel!
I am the owner of rock life, when I first published it, I got like 100-120 fps on a 3080Ti. I knew this was trash because you stare at a rock and the grass sways around. I spent over a month optimizing with gpt and help from teachers and got closer to 250fps. I’ve learned more, redid some stuff and it’s closer to 300 but I’ve disabled the unlimited option and put the game to 120 for everyone because I got people complaining that their gpu was at 100% in the reviews.
that was super informative, thank you!
11:30, Daniel is dead on, if a game dev has a problem that isn't complex/hard to multi thread they will. See all of computer graphics.....
16:44 Thanks for saying this Daniel, multi threading stuff is hard, and doing it afterwards is even harder. But the way production of games works this will often be the case. Make it work so we know if we even made the right game, then optimize it!
17:40 FPS is actually a bad way to measure game performance, 1 FPS worth of movement between 1fps and 2fps is a very different story than 1 fps between even 16 and 17 fps. Time per frame is much better. But players understand FPS, so that's what ends up on the graphs. The tools (profilers) that game programmers use to optimize games output time instead of FPS. If you want 30 fps you have 33.3ms to render the entire frame, 60fps 16.6ms, 120fps 8.3ms. Code takes time to run, so I want to know that.
Also fun comment, so everyone talks about cpu vs gpu bottlenecks... Factorio players found out better RAM sticks(same capacity better timings, aka latency) could improve game simulation times. Most games this won't happen but it does for some.
Also Optimization, easiest way to explain it... How efficiently are the cpu and gpu being used?
The last mile is the hardest!
Optimising is the last mile (or quarter mile even) and doesnt always offer the best bang for the buck, in that, those devs could be developing other features or products with a higher ROI (return on investment). You have to draw the line somewhere and larger companies will naturally draw the line lower than dedicated PC enthusiasts.
Never thought I'd see or hear you talking about an Ys game! 😁 It's my favorite game series in the world and I'm waiting my physical copy of Ys X for my PS5 to arrive next week! (Hopefully next week because it's released on Friday and I never get anything that gets released on Friday the same week 😑) I buy most of my games digitally, but I collect physical copies of my favorites ones. 🙂 I checked the PC requirements for it and they really surprised me. According to steam I could easily play it 1440p 60fps on my current rig that's 4-5 years old 😅 But I'll play it on PS5 because I pre-ordered the game months ago. I may try it on PC later to support the guys!
In the Y's X demo my XTX at 4k is not being fully utilized stays around 70% ughhh
That last step or two can take weeks or months and costs tens of thousands of dollars in some situations depending on how much staff you keep on so I don't fault devs most of the time. Its 100% a value proposition. That said, those staff should be kept on anyways whether or not a project finishes but thats a whole nother topic.