Can't wait for game engines like Unity and Unreal to implement these features - that way a bunch of developers will be able to add DirectStorage to their games without having to completely restructure them.
@@youtubekilledtrustedflaggi9274 you think implementing a feature automatically cuts out who has hardware that does not support it? You know, right, it's common practice in game development to probe the hardware you are running on to check for available features and then use slower methods that do not require what are you missing? For example some old video cards do not support Compute shaders. (GPU accelerated calculations for things not related to graphics), easy, check if you have access to Compute shaders at the beginning, if not, you do that on the cpu, at a lower quality even if it would be too slow. Doesn't look like to me that all the games that support RTX do not run on non RTX hardware either...
@@youtubekilledtrustedflaggi9274 Existing games that patch in support for DirectStorage won‘t become unplayable for setups that don‘t support it. And with new games it will quite a bit of time before it will be in any way „required“
A few key points were missing from this video... DirectStorage on Windows 11 is using bypass IO if supported by the attached storage device. This greatly reduces kernel overhead by skipping many stages of the IO stack allowing IO to start more quickly and results to be returned more quickly both with lower CPU overhead. This alone will have significant latency improvements as well as save CPU time in ways not previously possible using standard APIs on Windows 10 and earlier. DirectStorage itself is an asynchronous API. Asynchronous APIs have been the ruler of IO performance for a long time. Yet due to the complexities involved with using them their adoption has been quite poor with much slower stream based IO often being used instead due to its simplicity. This can be seen in how a lot of game archive formats are constructed. Anything that requires frequently reading a small amount of data to know what or how much more data to read is not optimised for asynchronous IO. DirectStorage offers batch reading support. Much like modern graphic API command queues you can submit multiple different read requests with a single kernel call. This allows high utilisation of NVMe read queues, such as using a single queue submission to read the meshes and all textures for multiple modules. In the video this would be similar to the shown texture atlas animation except when not using a texture atlas. DirectStorage does not bypass the CPU and RAM entirely. Custom decompression, a supported feature by the API, will still have to use the CPU in some way to process the data from memory. When moving directly to GPU the file data is still temporarily placed into RAM, even if directly into an upload heap staging buffer and not touched by CPU threads. Although the technology does already exist to bypass CPU threads and RAM entirely it is currently unclear why DirectStorage is not using it.
GPU upgrade would not matter, unless the current GPU lacks required features forcing the data into the CPU for processing as a work around. The main improvement over the current direct storage implementation would be a saving of some latency and memory (or cache) bandwidth. The data would be moved from the NVMe SSD straight to the GPU VRAM via the PCIe controller cutting out CPU caches and memory entirely. This is apparently what consoles like the PS4 use and does exist in the data centre with a proprietary API for Nvidia enterprise GPUs.
@@drsupergood8978 oh wow thats very interesting to hear… but wouldn't that crash the GPU market? I mean if it doesn't matter then how will companies compete? There wont be a need for buying new GPUs because they wouldn't speed up the process anyways. Maybe if they break due to aging or manufacturing issues. If thats the case, then that the only way i see companies profiting off of this is if they force us to buy new one by ruining the GPUs in software, hardware or maybe monthly payments in order to use the technology. Of course future upgrades that unlock new features would be a thing and maybe thats where we will want to upgrade. But im sure that im wrong somehere (im also just a child so for anyone reading this, im just saying whats on my mind and don't take anything from this, im sure there are solutions and answers to my questions)
@@matyaskatocz9323 Why would the GPU Market crash? We're only talking about storage en transferring assets here. It has nothing to do with rendering power and how much FPS/Details a GPU can render. Rendering power will still be something that will grow over time and this is where GPUs will continue to compete.
@@prateekpanwar646 They are so common now that I have 2. I have a regular SSD and an m.2, because why not. One is mainly for my OS, the other is dedicated to games I'm playing.
Yeah, there are so many bad takes in the comments section, so I'm glad someone gets it. Btw, this was all explained very clearly in Sony's "The Road to PS5" video two years ago.
HEY I'VE GOT AN IDEA: How about, we sell games on their own SSD? And since SSDs are fragile chips, we could enclose them in their own containers. And put labels on them for what game it is! We've come full circle!!!!!!!
It didn't just affect load times, some games that doesn't use all the gpu resources because of having too many assets to load and causes a cpu bottleneck will have a huge improvement in performance.
im guessing it will scale with bandwidth of the SSD. so m.2 will perform better than sata SSDs. i could be wrong though. it could also scale directly with RAM frequency and be less effected by latency?? i dunno. just brainstorming here
@@Cheeseypoofs85 From Microsoft's developer blog directly: DirectStorage is DirectX 12 API for Windows 10 and 11 PC with NVMe storage. Data is loaded to RAM and copied to VRAM without I/O calls involving CPU (GPU was able to read dedicated RAM for decades, now it will have instruments to make I/O call to NVMe storage itself). Purpose of this API is to shift data decompression duty from CPU to GPU to the limit allowed by the hardware and it's firmware (yes some firmware will lock you out on fully capable hardware).
@@Kholaslittlespot1 I guess the real question is will we need something like a PCIe Gen4/Gen5 SSD? Or will there be no notable difference past storing on a 2gb/s Gen3 NVMe drive?
Compression is not just important for reducing installation file sizes. It also speeds up the I/O since reading the data is a bottleneck in the system. If the data is compressed, you have less data to read, so it takes less time. The consoles have hardware decompression built-in so data can be stored compressed and read in that state. The level of compression achieved then becomes a multiplier to the effective I/O read speed.
But not automatically, like you said consoles have hardware compression available, but PCs have to use the CPU, there are a surprising number of people still running
@@rngQ Yep. That's the point. Compression is a huge part of I/O speed and this video relegated it only to reducing installation size. The fact that PCs will have to continue supporting CPUs that can't handle decompression in real time while running the game is the reason why PCs as a whole will still be playing catch up to consoles on this issue, contrary to the title of this video. In fact it will likely be the need of games to also run on a wide rage of PCs that will hold back what can fully be done with the much higher SSD I/O speeds for some time to come.
Same reason file system compression with a decent CPU can greatly improve read/loading speeds. Apparently that may be moving to the GPU sooner than later. Less and less data seems to get processed by the CPU as time goes on and eventually, GPU may just stand for "General Processing Unit"
@@KenMathis1 Kinda. It depends on Nvidia for their GPUs to make something. RTX IO would be the Solution you want here. And that should be out since End of 2021. Since Direct Storage just released and no Game is using it anyways until Forspoken releases… they probably said „Fuck that, we will wait to release it since no one could use that anyways“. I assume Nvidia will (re) announce it when they show the new 40 Series Cards. Also wonder what AMD will say to this. They are working with Xbox and PS together and know how they do it but they never mentioned it til now.
@@rngQ though some compression can be incredibly efficient even on low end cpus, for example windows's built in compact.exe with lzx has no perf impact even on a lowly i5 2400, and in some games it can go down to like 20% of its original size
There was a time in decades past where it was expected that RAM and storage would soon merge; your storage would be a non-volatile RAM that worked at high speed. Sort of like the old RAM-disk utilities. You wouldn't "load" a program into memory, just access the memory where you installed it. I would have hoped that this would have come to pass, but not quite yet.
There are a few SSDs that can match RAM speeds but it's usually an expensive proposition (Optane and/or certain raid setups with higher end SSDs) and with DDR5 just now coming out it seems that RAM will soon be much faster than storage again in the near future. So long as RAM has a significant speed advantage it's unlikely we'll ever stop using it. I remember hearing about some technologies that were in the works years ago that would supposedly give us Storage with equal speed to RAM using spintronics or carbon nano-tubes but I've not heard anything of those technologies in years suggesting they hit a roadblock they simply couldn't overcome. If we do ever get such hardware capabilities working and affordable enough then we'd still have do so some major work on the software front to take advantage of it or really just to use such a system at all (at first they'd probably have some section of storage emulating RAM for non-aware software but the better solution would be to reinterpret the code or perhaps recompile it to use the new tech).
@@syarifairlangga4608 The carbon nanotube storage I mentioned would theoretically solve the issue of durability. The basic idea as I understood and remember it (this was back when Optane was first announced) was to have cells composed of a tiny carbon particle inside a tube that would switch between 2 positions depending on the value. In theory such a system would be far more durable or at least far easier to repair since the memory modules themselves would be unlikely to fail. I presume the road block was either that it was too difficult to manufacturer or they could get them to be fast enough (if memory serves they could theoretically switch faster than Optane but theory and practice often differ).
DFM? I remember reading about a non-volatile RAM replacement years ago, but can't remember if it was DFM or something else. Just sad these technologies get announced then.. never seem to come out, or get released to the consumer market.
@@grn1 Optane's PCIe latency is still ~1000x slower (on the order of tens of microseconds) compared to low-end DDR4 (tens of nanoseconds). And SSDs with RAID are a joke compared to Optane, never mind RAM. Optane Persistent Memory modules are comparatively faster, but still 10 times slower than DDR4. With Optane's development now halted, it seems unlikely we'll get to the great merge anytime soon.
@@ceebee It is stated in the video that its still involved, and that it is an issue that may be solved in the future. The question is how much, and the only way to get a definitive answer is to test it.
It likely won't have much of an impact on load times in practice. It is already just an edge case that the cpu file copying/decompression during load tkmes is the speed limiter. And it should be even less difference in % gain between high and low end CPU's. Keep in mind that you are unlikely to e.g. combine a 3080 12GB with a intel celeron or somethin really low end. And lower end Graphics card usualy also come with less VRAM, and thus also less data loading during loading screens. The actuall rela benefit will be in data streaming, as that happens while the game is actively running and the CPU has plenty to do with that. If your CPU is only running at 20% saving 5% doesn't vhange much. If your CPU is already at 100% and starts having a backlog? Then those 5% make a hughe difference.
@@reappermen Smaller VRAM actually means you have to put more data through it in the modern games streaming an "infinite" world (less VRAM to cache existing data you may return to). E.g. it took me 7 hours to run through the starting island in Elden Ring (is there more islands? I didn't fight & level, just travelled to see about 2/3 of the world freely accesible to a beginner character). That happened without the old-school loading screens. Also there are already coming modern GPUs - much faster than what you can have in PC - equipped with "Celeron" class CPU. There's no need to pay for a beefy CPU when the GPU can load the data from NVMe itself. When you scale huge server farms, you save a lot of money on the CPU.
@@ladislavzima8382 You are correct on the smaller VRAm cards needing mroe data streaming. My reference for no load time imapcts there was specifically aimed at the OP's mention of loading screens, which do get faster with smaller VRAM because it can only fill the VRAM once during a loading screen, and the less VRAm there is the fast that is (o nthe same bus). As for there beeing modern GPU's paired with Celerons, that might exist for neural networking or specialised analytics and modelling software on some servers, but those already run their homecooked everything as OS and don't need Direct Storage. For everything gaming related 8which the Op semed to refer to with his post) you absolutely do not want to pair a high end GPU with a cleeron or similar crappy CPUI, as that will massively bottleneck your system no matter whta you play. And finally, there currently isn't really all that much in the GPU server space that is that much 'better' than a 3090/3090ti when it comes to the GPU alone. most of the gains there is in better Bandwith and MASSIVELY more VRAM. I forgot which card it was, but in fact one of Nvidias current highest end Server GPU runs on a slightly changed 3090 Chip, with like 4 times the VRAM and other fringe benefits attached. But for pure cycles, the servers space isn't mcuh ahead. Server software is usualy made to run on multiple GPU's at once if it's really demanding, so getting more cycles in one card has limtied benefits.
Do remember some of the very large size is because on HDD you'd have multiple copies of the same file so it could load it all as one block when you need it, rather than having to jump around (reducing load times), that should drastically reduce some game sizes if HDD support is dropped/ SSD becomes a requirement (as SSD's have way faster load times for lots of small files since no spindle). Note as well, the Nvidia GPUDirect storage is also Linux compatible, currently in beta on Ubuntu.
@@rajder656 except high capacity is only a huge worry because the games are so big now. if direct storage also means trimming down file size its not really an issue. also I don't know why people think HDDs would just not work when games are developed for direct storage. itll just become another thing that plays into your overall performance. hdds still see a speed improvement with direct storage. additionally methods to optimize performance on old hardware will be around until its irrelevant. like grabbing smaller assets to display the same thing off a hard drive. a low poly version of the models a hard drive might struggle with
@@demonz9065 i mean if you store only games good for you. And it also only applies to new new games. If i have idk movies or other stuff alongside games then HDDs are better even without those if you want just games HDDs are better right now and for many years after since we won't see widespread usage until 4-5+ years
Great video, but there are some inaccuracies. You don't have to reload an entire atlas to replace one of the sub-textures. Rather, render the "new" texture into a spot in the atlas to replace one of its components. It's not intuitive, but it's not complicated. Atlases are intended to minimize thrashing when swapping textures, and they work really well. Replacing an element comes with its own cost, but it's not nearly as bad as a round-trip copy. The "blurry lines" between adjacent textures in an atlas are compression and/or sampling artifacts, easily solvable by using lossless compression and properly-selected sampler settings in the shaders. When generating the atlas on the GPU, those artifacts do not exist.
that is one technique, that is not always used. Because, in the end, the GPU needs to wait for the new changes in the atlas to be made, creating idle time/bottleneck. That's why, in a coding standpoint, just sending a new atlas have the same result in the end.
I remember how much Playstation focused on this feature for their PS5 so it's nice to see that in a couple years this console selling feature will be implemented on a platform with a larger number of titles & greater control over system settings!
@@samfkt thats because consoles don't have the single thread performance that PC does. Traditional loading on PC with a fast CPU and a slower SSD will match the consoles with the fancy APIs. Even Direct Storage doesn't remove the CPU advantage, only helps lower it somewhat.
I'm curious whether this could be used for render engines. On of the bigger problems for small-medium sized VFX studios is that they'll use GPU rendering, so often (particular with larger or more complex renders, they're limited by VRAM, even with a 3090. Being able to chuck 1TB of super fast tertiary storage, could allow much larger renders on cards that don't have enough VRAM, although likely at a fairly significant performance overhead. However 'slow but available' beats unavailable 100% of the time when it's needed.
I think they just did a video about a Radeon GPU that had SSD slots on the card for this exact purpose...I don't think it was very good though, I can't remember...
Theres no reason a fetch over PCIe form the SSD would be faster than a fetch over PCIe from the (much faster) memory controller. Though it would certainly be cheaper, youre probably still better off caching it in system RAM.
@@mycosys I agree. Even super-fast NVMe SSDs usually top out at maybe 2-3000Mb/sec, which is around a tenth of the bandwidth of a single DDR4-3200 stick--and once you have dual channel RAM (and what gamer doesn't?) the discrepancy gets even greater.
Initially I thought it was weird that you brought up Street fighter because I would think that a fighting game would have one fixed set of assets for an entire match and not need to load anything during gameplay, but then I remember how nice and snappy it was to start matches on old arcade games where everything was in ROM, you could be fighting in under a few seconds. A very nice user experience.
@@ItsAkile It’s more about fast loading, as that’s been an issue with some recent fighting games. Tekken 7 in particular was awful about loading, even on PC.
I think one of the first games to stream in assets was 1999's Soul Reaver on the PS1, so the industry was moving towards that kind of game design even before the turn of the century.
The Crash games for PS1 streamed data off the disc, and Andy Gavin, the lead programmer behind the Crash games has a fantastic dev blog that goes into some of the details. It was groundbreaking stuff at the time, and Sony was worried because they never built the PS1 with that amount of constant disc reads in mind.
That's eventually how most big games began to function on DVD based consoles. You can't fit entire big levels in the extremely tiny amount of ram on a PS2, or even the PS3. Crash might have been one of the first to do it, but there is a reason why all PS2 and PS3 DVD based games had to constantly, literally non-stop, read the disc.
I'm always happy when there is some new tech that Anthony gets to do a video on. Just has a way of making things easy to digest and understand. So credit both to Anthony and the writing team.
I love this. I feel like I need to note though, just because hardware is faster shouldn't be an excuse for Devs to cheap out on optimisation. Maybe for demos as proof of concept yeah, but please not for final release.
Yea I also can see unity and unreal engine devs be like okay so we put uncompressed files to make load speed faster, right? Oh wait what is this... a 400 gb folder? Also imagine if you're not running SSD. You're IO performance will be killed with these giant uncompressed files that will just eat whole throughput.
That might be the reason that DirectStorageAPI will be a niche thing. Because consoles are very cool in a way that you can do all of the tricks with the hardware because you're sure that EVERYBODY will have same hardware.
Not even a minute in and you already know that Anthony will once again give insightful foresight presented in an entertaining manner, easy to follow even for people like me that don't have a clue.
This begs the question though, were the test results valid? You compared compressed vs uncompressed from the looks. This draws into question the validity of the results based on the data given. It is entirely possible it is still a mostly valid result if the compression saved enough time in disk reads that the decompressor was not adding time, but that was not shown.
You seem to be missing the point. PCs lack the built-in hardware decompression consoles have, so while compressed data doesn't hurt them and basically just increases their effective I/O, the decompression on the cpu has become a serious bottleneck for PC ports. So uncompressed vs. compressed is exactly what this is all about. GPU decompression is going to come but hasn't been implemented yet and is therefore unlikely a part of this demo. Games that use direct storage on PC will most likely have even larger installation sizes as a result, but I'd rather add another $200 ssd than buy an entire new system with higher ipc processor and a 12+GB gpu just to handle the compressed data streams optimized for consoles. Because that's what we've been forced to do until now. A game stuttering on PC while running perfectly on a console with much lower specs has never been anything else than the PC choking on a compressed data stream it wasn't designed to handle. Optimizing a PC port for the most part means changing compression algorithms, package sizes and the overall data streaming behavior of the engine ... so devs should have it a lot easier in the future if optimizing a title for our platform simple means storing assets uncompressed or with lighter compression the GPU can chew through in an instant. With the possible transfer rates of PCIe4.0, the already finalized PCIe5.0 and DDR5x/6/6x memory we are way better equipped to handle that than what we've had to deal with in the past.
@@protator Thanks for your insightful comment mate. So as of know we are waiting for MS to release the second part (gpu decompression). How is this on the Linus side for the steam Deck etc.
@@1989ElLoco Haha, thx for the compliment, but though I have been doing some 3d modeling for games in the past I'm basically just another enthusiast who pieces together bits of info from multiple sources. So take my comments should you run across them in the future with the obligatory grain of salt. Either the final or future implementations of this tech will use some form of data compression as well, just in a form that makes it far quicker on a gpu than it used to be using the cpu, bringing consoles and PCs very close together in terms of how they handle asset streaming. What I said before is just my takeaway from what I've read so far and what I assume is showcased in this demo. Well, Linus will probably go all out testing this, but you obviously meant Linux^^. Unfortunately I'm a windows-only guy - although that could always change. The steam deck uses a moderately clocked APU with 16GB shared memory iirc, so the concept doesn't apply 1:1 since you need to shove that data through the cpu and into system memory anyway, even if you can skip the compute intensive decompression. But if there's still some decompression work to be done then at least the cpu will be spared from it and it'll be done on iGPU cores which are still more efficient in that case. With DirectStorage the CPU acts mostly as a bus controller while the gpu exchanges data sets with vram and nvme storage directly, so memory management might be the deciding factor as to how much of the benefits you get with and APU that has tighter memory constraints compared to a desktop pc with dedicated graphics where you simply have more headroom to brute-force things. On the other hand Linux has always been very good in terms of overhead and latency compared to the windows kernel, so I am convinced that steam deck users will see meaningful performance improvements once this gets implemented.
The results still seem valid in a sense. On consoles the difference is quite huge, but the real world performance we have seen so far has been pretty much useless, but hopefully fixing the compression issues gets us closer to those console loading times.
PCs have had the capability to do PCI bus master transfers like this since as long as PCI has existed. And with DMA transfers havent been going thru the CPU since the original PC, cos DMA
Look up io_uring - many of DirectStorage’s ideas come from there, and the main concept of the API (reducing the cost of IO calls) translates pretty much the same into DirectStorage calls. Decompression (when it arrives) will be more complicated, but the base API is already there.
I'm pretty hyped for DirectStorage moving forward. I don't think anyone will miss the days of needing loading screens, so I hope and think that it will just be integrated into all modern engines eventually. It will also be very interesting to see how this can be leveraged for non-game apps that need a ton of data-throughput, like video editing for example. That could be a serious game-changer.
Combine tech like this with stuff like UE5s Nanite mesh handler and we're going to see a huge jump in visual fidelity, too. Closer to photo realism than ever.
08:09 Also, Consoles usually have shared memory, ie.. Cpu and Gpu share the same memory space, so the data is only loaded once So it's already easier to implement there
It could be a case of simply passing a pointer rather than moving data. The massive separate VRAM in a GPU is a bit of an anomaly. In home computers of the 1980's the video chip was connected to the same RAM space as the CPU. If the VRAM could be directly addressed by the CPU that could be a huge help. Maybe PCIe gen5 will do this. That's another thing, in 1980's microprocessors everything was on the same bus. IO space and memory space were all the same, just different addresses. Very Alan Turin.
Sadfly the vid is really premature and none of what he talked about has even been implemented. The initial API just provides vastly lower IO overhead on these streaming calls
While compression streeming have been around pretty much since the 90tys. There have really been very few games that used it. Granted, most game have suported them.... but a lot of them, i would say most, have had it turned of as default and hidden it somewhere in the menues.
So have Direct Memory Access and Bus Mastering - which he completely failed to mention. The transfers havent gone through the CPU cores literally since the first IBM PC. This adds a layer of SOFTWARE primarily, an API to let software tell the OS to use a bus master transfer from the storage to the GPU. It has been possible (just not practical) since the dawn of PCs.
I literally asked this question about 5 years ago when I was first introduced to m.2 slots... I asked when are they going to allow GPU to directly connect to the PCIe lanes of the m.2 slots instead of going out of the way and then stuff like compression to help.... thanks for the content!
If game's increasingly STREAM assets from an SSD in the future does that mean VIDEO memory requirements will drastically slow down? I've seen demos for DirectStorage that used FAR less video memory to produce the same scene as normal methods.
Yes, absolutely! Being able to load assets as needed over the PCIe bus instead of prefetching assets will definitely reduce how much data needs to be sitting in VRAM for a scene to be rendered quickly
Idk about that but at most it'll probably be stagnant. But maybe they'll focus on speed more in the future. Nvidia did prove that speed is extremely important. Who knows we might even go back to HBM2 in the future
not enormously - they still need to hold all the textures for the scene being rendered - it would save main system RAM (because you dont need to copy all the stuff there first and probably cache it there) though
ps2 had a 4MB frame buffer btw, which is like 4MB of Vram. It was still able to beautifully render games like Silent Hill 3 (Within its resolution limitations.) Let that sink in.
i am truly impressed by some of the games comeing out recently, i remember the days when we woundered if they would look as good as the movies CGI and now they look as good and in some cases better. gameing or vertual enviroments have so much potential as long as the devs are willing to think outside the box. i think we are just getting started with where games can transport us to and its only being held back by the industry playing it safe.
FYI: SATA SSDs will also see a decent improvement in speed, at least, according to forspoken's benchmarks with and without directstorage. NVMe SSDs will see a much bigger improvement because the overall drive speed itself is much higher. This is pretty surprising as directstorage was always something the was said to _require_ an NVMe SSD to function at all, so not sure why they were saying that when SATA SSDs also see noticeable improvements (Around 20% in forspoken)
yeah idk why people would think an SSD was a necessity. the benefit comes in removing steps required to get an asset from the drive to the GPU. those steps arent something that only exist in an SSD so why would other storage media have no benefit?
@@demonz9065 I think the reasoning I heard most was that NVMe drives use PCIe lanes instead of the SATA interface, which means they have a direct connection with GPU and RAM whereas SATA has to go through the CPU. Despite that SATA SSDs still see improvements, so there's definitely more to directstorage than utilizing the NVMe interface. Still not sure why they _required_ NVMe drives when they announced directstorage, when forspoken devs were able to run it from a SATA drive and still see decent improvements.
@@TexelGuy that is all false. DS takes data from NVME to system RAM and then again to GPU VRAM. It's not direct over PCI to PCI. Watch the microsoft video, it has 4k views cleary nobody wanted to be educated.
@@lexsanderz 1. NVMe drives were always implied to be the _only_ way to use directstorage in ALL of the information made available by microsoft online. 2. You seem to be misunderstanding my comment, I never said data can go from NVMe to GPU directly, only that it provided a direct path for it through RAM without going through the CPU like on a SATA drive.
The first time i remember reading about asset streaming was Crash Bandicoot on PS1. Yes PS1, it's why the levels undulate up and down and you couldn't move the camera. Hiding asset loading. The opposite of that on PS1 was Ridge Racer. That game loaded the whole thing at the start and you could swap out the game for a music CD and keep playing. Im thinking Bethesda has to be in on this for Elder Scrolls 6. Since they are using photogrammetry for assets. Loading them in as you move between exterior cells wouldn't be a problem anymore. Anyone who loaded up the original Skyrim with a tone of mods back in the day can tell ya. It would stutter like crazy if you added to many large texture mods.
Decompression on the GPU sounds really cool. My question is will that require new hardware or can it be added later in software? For example could my GTX 1070 do decompression when that feature is available or will it require a new GPU with a specific chip onboard dedicated for decompression?
I feel that it could be done in Software on older GPUs, but include an instruction set for compression in future hardware revisions. Kind of like how RTX 20x0/30x0 cards can do real-time ray-tracing in hardware, and GTX 10x0 cards can only do it in software.
@@JayrosModShop I mean I doubt nvidia will push such a big driver change including the CUDA version of decompression. It will be a big change that can lead to potentially broken compatibility or just something else buggy. They just won't do it.
@@pixels_per_inch I mean that stuff still looks like rtx thing. Yea it's here and it's cool but not everybody is gonna have it neither it is so important.
That you to the editor who didn't put in annoying thumping background music, you are awesome unlike some of the other editors who seem to think every video needs to be inside a nightclub
The directstorage demo read a different file though, so it's kinda improbable that the SSD had that one cached *. Also I trust Anthony to be competent enough that he ran the demos in the opposite order at some point, just to make sure that the caching isn't massively skewing the results. * "improbable" because SSD manufacturers may implement heuristics and other magic to help caching and make their SSD look/feel faster. But since the file was like 52MB big, I'd feel that the SSD won't decide to pre-load and cache it, even if some magic software layer concluded "He always loads these two files one shortly after the other.".
@@Flameancerit's worth understanding the difference between "open source" and "proprietary". Teslas designs are open source and proprietary, which means they can go after anyone that makes anything like a Tesla because they can say you copied them. Linux is open source & GNU licensed which means you can modify & copy to your hearts content. It's worth reading up on Embrace, Extend, Extinguish directly, I will suck at explaining it because it's like the financial system; deliberately hard to look into The Extend, Extinguish aspect of what Microsoft does can include several strategies such as: a) Having an open source nonproprietary version that gets less love than the main one. This is seen in the Open Document Format that Microsoft got forced to use, technically you should be able to use any Microsoft-software made document on any other editor easily but they set about specifically adding functions that competitors couldn't use or replicate easily to make interoperability harder or impossible. You end up with two standards, the official one and the proprietary one at the same time. b) They bloat the standard directly. Assuming it becomes a nonproprietary standard they can have it get bigger, more demanding, harder to develop on until nobody wants to use it, it slows systems down, or slows development down. They can then have the good version they keep internally, rename it something like "Xbox Microsoft Ultimate Gold Windows Fast Loading System", and release it as proprietary in their products. c) Linked to b they can also either use industry links to have hardware that prefers their proprietary standard released or they can include code in their own systems that skips the issues. nVidia does this with their drivers; they include code their old hardware has to run in addition to its normal job that newer hardware will skip over
The groundwork to make this possible was laid all the way back in 2016. It is possible to export an OpenGL or Vulkan buffer object (with GL_EXT_memory_object_fd & VK_KHR_external_memory_fd respectively) as a dma_buf file descriptor. What is left to do is to allow syscalls like "copy_file_range" or "sendfile" to initiate DMA transfers to the GPU buffer from userspace; a thing they can't do yet (only fs-to-fs or fs-to-pipe/socket copies), but in theory is possible. Slap that in an io_uring OR poll the dma_buf object and you've got a DirectStorage equivalent.
This is literally the same thing as the sega genesis "blast processing", an DMA circuit that can copy data from the storage to the video memory a lot faster than the CPU can.
The IBM PC had DMA. Whats worse this video is REALLY premature and DrectStroage doesnt have DMA or GPU decompression yet. ATM its an API that improves IO call overhead.
@@mycosys Sadly they didn't evolved this side as much as they should, otherwise well, we wouldn't quite need direct storage. It would be pretty cool if PCs could do timed DMA transfers etc..
"Blast Processing" was 100% a marketing gimmick. Some marketer overheard discussion of a bug in the VDP, during conversation the words "blast" and "processing" came up, so they just put them together for marketing. It is a real technique, though very obscure, not used by any commercial games, and was only figured out to be viable a few years ago. The fact that it even works is pure luck. It allows using more colors than the color palette allows by abusing a bug, advanced manipulation of the CPU to sync with CRT, and consuming 100% of the CPU, so isn't really feasible in gameplay. The SNES also had DMA, and was used a lot. The genesis was faster CPU, though the SNES had an advantage with Mode 7.
@@xerideaOf course it's a marketing ploy. also i think, not 100% sure that the sega genesis can actually copy data outside of the vblank time, which allows it to move a lot more tile data per frame than the snes, which was abused by games like sonic etc..
This would be great for megatexturing. You could have a cache of texture tiles on the onboard SSD. So basically two levels of megatexturing. Basically, the GPU could use megatexturing and the SSD could use clip mapping. Megatexturing is using the exact tiles that are seen on-screen. Clip mapping is using loading the highest resolution tiles for those that are near the camera and gradually load coarser resolution tiles the further away they are. This used to be limited when using the GPU. But with an SSD, you could prefetch a lot more data and significantly reduce texture popping when using magatextures. In fact, only far away coarse textures would be prone to texture popping and since they're already coarse, the effect would be far less noticeable if it happens at all. You could do the same for 3D assets and load higher resolution meshes for nearby objects and lower resolution the further away they are with as much overlap as possible for multiple LODs.
"...is loading nearly THREE TIMES FASER." 0.33 / 0.08 = 4.125 I mean, in the grand sceme of real numbers, 3 and 4 are quite close but woud "OVER 4 TIMES FASTER" sot have sounded even better?
I'm curious to see how difficult this will be to implement in compatibility layers like Proton for Linux/SteamDeck. I don't trust MS to make things easy now that SteamDeck has proven to be so popular.
To quote Microsoft: "Microsoft Loves Linux" May 6, 2015 (Can't post the link, because it gets deleted, but you can find it easy) Guess it's time to stand by their word and take action with that statement. Especially because hardware-wise the SteamDeck already is the console-like platform that's also a PC, that gets mentioned in the video at 8:15.
I wonder if HMM (Heterogeneous Memory Management, wich does already exists in Linux for ages -2017- and permit, if I'm not wrong using GPU memory "as any RAM") can not help here, or if does solve only a part of the problem (because I don't understand everything but I -think- it will only bypass the RAM storage, but not the fact the CPU still needs to be involved)
@@potens1 Direct Memory Access or DMA has been used by network cards for decades. The device is configured to put the data directly into RAM and then ping the CPU when it's done. I don't know the specifics but the same principle should allow storage to put things in VRAM. The CPU could issue a command to the storage which would then get the task done and ping the CPU when done.
I really hope this also helps with frametimes. Im so tired of modern titles having horrible stuttering issues. Really breaks immersion and sometimes hinders gameplay. Maybe with faster asset streaming, we can not only see faster load times but also smoother gameplay.
i was just thinking the same and looking for such a comment. also like that his content is more informative and goes deeper, you actually learn something here.
One of the best ways to see the impact of direct storage imo is Ratchet and Clank Rift Apart when going through the rifts, especially the part towards the end of the game when you are quickly going through a sting of rifts bringing you to a bunch of drastically different looking areas. That has been one of the very few moments so far in next gen games that really shows the advantage of next gen hardware. This is why i think Microsoft's decision to keep supporting xbox one with all new games for the foreseeable future is a massive mistake regardless of them claiming last gen isn't holding next gen back, that line is a complete and utter lie, they know it as does anyone who has played a true next gen game. The amazing rift mechanic in the newest Ratchet and Clank game is simply impossible to do without the extremely fast ssd and direct storage, especially when looking as good as it does. Microsoft making all games also work on xbox one prevents them from adding core gameplay changing features like that. Fast/near instant load times are nice but things like the rift mechanic are where the new direct storage feature truly shines. It allows true open world games that don't require ANY tricks to hide loading screens like mini cutscenes when going inside buildings, you just go right in. I am sure it enable even more really cool things that couldn't be done before.
I am not sure everybody understood that compression part so I'll clarify it a bit. The thing is that there is DXTn compression. Which is just texture compression that can reduce amount of memory the texture takes on HDD(or SSD) AND in VRAM at the same time. So that means that the GPU can use that compressed texture too without decompressing it at all. On the other hand there are very complex compression algorithms that archives use(i.e winrar or 7z). But they are used solely to compress files on HDD, they need to be decompressed to make use of the files. This is what games use to pack their content so games won't take too much space. And that is in my opinion good. Yea you need to wait a bit of them to decompress but if the gpu will be able to handle stuff like lz4 lzma lzma2 bzip that are used as typical archival things to make the gpu load faster with directstorage. Or maybe they'll comeup with new algorithms like DXTn that will allow the GPU to use compressed assets directly(although it's probably not gonna happen for other types. Because usable compressed assets are only textures and it was a long time ago when that was inveted.) I hope that made it clearer to understand.
The last 2 generations of consoles (and actually phones) also have unified memory, the gpu uses the same memory as the cpu so there is no need to copy it to vram
@@rajder656 APUs do not share memory. They simply segment it. You literally lose chunks of system memory, which then has to be copied from that chunk of system RAM to the chunk allocated for the APU
There still has to be some CPU involvement to direct the video card to the blocks it needs from the drive. I doubt the video card understands filesystem structure, which could be FAT32, exFAT, or NTFS at this point.
Actually the question of how hardware-deep memory access DirectStorageAPI will use under the hood could actually be a very important performance factor. It'll make huge difference. From my experience from working with DirectAPI's(mostly d3d's) it will have numerous capabilities enumerations that will vary in hardware->software with hardware things being fastest but requiring hardware and software being garbage but working on any system(sorta HAL).
@@electrosssnake1036 dude, data fragmentation and read errors are too completely different issues, drive fragmentation just means the read heads of your hdd have to jump from place to place in order to gather the requested data instead of reading it in one go. Basically random I/O verses sequential read performance. But that's mostly a thing with spinning rust, and this tech is meant for current gen systems with NVMe storage, not 20 year old toasters.
@@protator dude not everybody has ssds. Question is what will happen if you do not have last gen tech. Such as hdd which could be even slower if fragmented.
Correction: smaller textures in an atlas can be tiled - it just takes some extra shader math to tile the specific region of the atlas rather than the whole atlas.
Machine learning will love this. Training on a big dataset can give you lumpy GPU utilisation graphs even with batch streaming, as the data is loaded to CPU/RAM between batches. This is especially bad due to how bad parallelism works in python, but with direct data loading that work can be shifted to a more parallelism friendly API and be faster.
Great job Anthony and gang, you explained this so very well, particularly the industry common practices and remaining challenges. Eager to see what it can do.
I hope you know that GPU and SSD can directly read/write system memory, so the CPU cores were never heavily involved (except decompression). This is DMA (well, it's modern version, where there is a request PCIe packet sent). Moreover, usually, there is no other path from the SSD to the GPU but through the PCIe root complex, which is basically the PCIe controller inside the CPU. The only real thing left out is the system memory; we now have a path SSD->PCIe controller->memory->PCIe controller->GPU, and GPU direct can do SSD->PCIe controller->GPU. (And note that for data centers, NICGPU copy is already a thing since at least 6 years; now we just have SSDGPU in Win.)
I made the same point - then discovered the vid is REALLY premature and the DMA and GPU decompression havent even been implemented yet. ATM its just an API that improves IO call overhead.
Seems like the next step could also be GPUs with onboard SSDs with the pipeline eventually becoming; fetch compressed textures, decompress on the GPU, store unused textures onboard, swap textures in and out from the onboard GPU storage. Like being able to decompress on the card like NVidia has technology to do is already huge...imagine also being able to only have to do the fetch from off card storage and decompression _once_ for _every_ texture because everything can be stored directly on the GPU. The next big step would be to have GPUs optimize draw calls by using some algorithm to make texture atlases of hot texture paths to really maximize any available on card storage to really fine tune what gets loaded into memory to reduce texture swaps.
That Unreal Tournament map brought back some of the best friend lan moments i have ever hade. We cranked up that AI difficulty to max and that was a difficulty that no other game have ever hade sense. For those who think that games can be difficult now try that and u quickly change your mind. That was the definition of Brutal.
The PS5 version is loading in around 20 seconds now iirc (still waaaay too long when you compare it to other first party Sony games that load in less than 3 sec)
@@terogamer345 Are you for real? When i open GTA online it takes between 5-10 mins to actually join a server. us PC users just really get shit on by rockstar xD
Not 3 times faster, in the tool Microsoft provided, but slightly more than 4 times faster. 8 x 3 = 24, 8 x 4 = 32. DirectStorage On = 0.08s, DirectStorage Off = 0.33. So you could load the same thing with Direct Storage 4 times, with time to spare.
@@FearInfected When you use "times" it implies multiplying, i.e. 3 times faster mean 3 times the baseline speed, not 3 times the baseline speed in addition to the baseline speed. If you talk to normal people and use this kind of language, they will naturally assume it's a simple multiplier, which is also why nobody uses "1 times faster" because it doesn't make any sense, you say twice as fast/two times faster, those two will be understood as the same thing by basically everybody. If you google "1 times faster" the top results are people trying to justify why "2 times faster" actually means 3 times by using "1 times faster", in real life "1 times faster" isn't used because it's obviously ambiguous. And besides, he clearly got the math wrong anyway, as he said "nearly 3 times" while in fact it was slightly greater than 4 times faster, I would guess the script may have been written with different numbers and wasn't updated or something like that, because the math is wrong no matter which way you slice it.
Let's hope that Proton on Linux will be able to translate DirectStorage calls into native equivalents or Linux gamers (Steam Deck anyone?) could be in for an increasingly rough time w.r.t. new game releases. I'm surprised Anthony didn't at least touch on this subject, especially since his boss is spending a month exclusively playing games on the Steam Deck!
Direct Storage : Comes onto PCs Rockstar Games : Casually re releasing GTA V for PC , saying "With faster loading Screens". (If you don't get it, it's cuz they said " Seamless character switching " when advertising Expanded and Enhanced, but it was all because of the console's hardware)
5:30 atlas textures "can't be tiled". No, they can be tiled, with the use of mip mapping. In fact if done correctly you can create mipmaps per tile in the atlas, thus if you have seamless textures in the atlas with the use of mipmapping. 2D and 3D games have been making use of it for ages. The problem is selecting the right LOD for the mipmapping, making sure the mipmap resolution is correct for the atlas.However the mipmap creates progressively lower resolution pixels around the edges of each tile in the atlas, which can result in a texture "bleed" but this can be compensated with using supersampling/multisampling which can help in creating sharper mipmaps. You can swap out tiles in a atlas texture in realtime. :) The drawback however is you can't set per tile mipmapping, so one mipmapping level across the entire atlast. Additionally if you don't make use of a tile based atlas, you can end up with a lot of unused space in the texture map. You guys have been delving into game development aspects a lot lately and been making a hash of it so far.
Texture atlases are still a thing for particual use cases, usually enviroments and props. Even today we game devs struggle with drawcalls on lower end platforms like switch and today even PS4. Khm..Cyberpunk..Khm
Nobody is sweating because after all PCs are superior and everyone knows that it’s just a fact and not even in terms of loading times the ps5 is superior because direct storage only reduces the loading time from 2,1 seconds to 1,9 seconds on a m2 ssd thats only 10% shorter PCs have a shorter loading time without direct storage than the ps5 does with direct storage
Can't wait for the 4 games that support it every year. So basically, it won't be common until 2030, when DX14 is out on Windows 12... because that seems to be the way all new technology goes, especially when it's exclusive to a single API on a single operating system (hi DX12). Doesn't matter if it's easy for a single developer to implement (like AMD's FSR), publishing companies (especially JP devs on PC) don't operate logically.
Gotta hand it to the PS5 for having thought this through and accelerated encode/decode into unified GPU space from its conception, DirectStorage still isn't matching every feature (GPU acceleration) but will get there.
And still takes 2 minutes to get into the game start screen because of stupid intros and advertising splash screens about different tech/companies used in the game. I feel most gamers would feel a real benefit if games would just play them once (or not at all!) and then turn them off. I don't need to know every single time I boot up the game that you made the frigging game, I know already I'm playing the stupid thing.
Another thing this could benefit is eGPUs -- build an eGPU box with a PCIe switch between a GPU, nvme, and the TB3/USB4 and you've just eliminated 90% of the traffic (textures, and most geometry) across the wire -- suddenly that 40gb/s is more than enough bandwidth for sending drawcalls *and* returning the framebuffer. No more eGPU bandwidth penalty, which would make putting top-end GPUs in them practical; plenty of bandwidth for framebuffer return (another small current-day penalty) makes eGPU a better fit for laptop users on-the go, too. Actually, that last part is just great across the board: it transforms the eGPU from something you plug into (like a dock) into something that you plug into your system when you need/want the capability (more like a flash drive.) An enterprising GPU manufacturer could build a PCIe switch and m.2 slot into a GPU today (like those Radeon SSG workstation cards), but no incentive until the software is there (and I agree that the compression situation needs to be resolved before that will happen).
Consoles right now, are better than just buying a Pc if you only want to game, due to how cheap they are Edit: To all the people in my replies, where I am I can manage to get a new gen console for MSRP sometimes, but not GPUs.
So infuriating that we had to wait for consoles to implement faster storage access so that we could make proper use of the potential speeds on our PCs.
That's always been the case in games in all but niche PC-exclusives. The lowest common denominator gets developed for. I'm personally really glad it's finally happened though, that and that console processing power has taken such a huge leap, we might get to see games doing CPU-intensive AI, physics & weather stuff again.
You can see it both ways. Consoles are also the reason why this kind of innovations become mainstream in the videogame industry. If the consoles weren't limited in some way in the first place, game devs would throw hardware at the problem without a second thought like they always do. In this very same video, Anthony explained data stream became popular mainly because of the limited VRAM in consoles and that untapped the possibility of creating really huge levels that weren't possible even on PC. Also, though DirectStorage is great, consoles still have the upper hand, being that they have shared memory and dedicated decompressors (Anthony also talked about that). Go see Mark Cerny's PS5 presentation. It really helps to be able to design both the hardware and the software instead of building things around a generic architecture.
Can't wait for game engines like Unity and Unreal to implement these features - that way a bunch of developers will be able to add DirectStorage to their games without having to completely restructure them.
Nice
@@youtubekilledtrustedflaggi9274 I'd love to hear who your source is
@@youtubekilledtrustedflaggi9274 you think implementing a feature automatically cuts out who has hardware that does not support it? You know, right, it's common practice in game development to probe the hardware you are running on to check for available features and then use slower methods that do not require what are you missing?
For example some old video cards do not support Compute shaders. (GPU accelerated calculations for things not related to graphics), easy, check if you have access to Compute shaders at the beginning, if not, you do that on the cpu, at a lower quality even if it would be too slow.
Doesn't look like to me that all the games that support RTX do not run on non RTX hardware either...
@@youtubekilledtrustedflaggi9274 Existing games that patch in support for DirectStorage won‘t become unplayable for setups that don‘t support it. And with new games it will quite a bit of time before it will be in any way „required“
And more headaches for games with their own engines
I'm a simple man, I see an Anthony thumbnail, I watch immediately
I simple man. I see funny comment, I reply 😅😂
You are so based
What else can a simple man hope for. Lol!
I am even simpler I can only count to 10.
Can’t imagine not watching an LTT video with my man Anthony as the thumbnail
A few key points were missing from this video...
DirectStorage on Windows 11 is using bypass IO if supported by the attached storage device. This greatly reduces kernel overhead by skipping many stages of the IO stack allowing IO to start more quickly and results to be returned more quickly both with lower CPU overhead. This alone will have significant latency improvements as well as save CPU time in ways not previously possible using standard APIs on Windows 10 and earlier.
DirectStorage itself is an asynchronous API. Asynchronous APIs have been the ruler of IO performance for a long time. Yet due to the complexities involved with using them their adoption has been quite poor with much slower stream based IO often being used instead due to its simplicity. This can be seen in how a lot of game archive formats are constructed. Anything that requires frequently reading a small amount of data to know what or how much more data to read is not optimised for asynchronous IO.
DirectStorage offers batch reading support. Much like modern graphic API command queues you can submit multiple different read requests with a single kernel call. This allows high utilisation of NVMe read queues, such as using a single queue submission to read the meshes and all textures for multiple modules. In the video this would be similar to the shown texture atlas animation except when not using a texture atlas.
DirectStorage does not bypass the CPU and RAM entirely. Custom decompression, a supported feature by the API, will still have to use the CPU in some way to process the data from memory. When moving directly to GPU the file data is still temporarily placed into RAM, even if directly into an upload heap staging buffer and not touched by CPU threads. Although the technology does already exist to bypass CPU threads and RAM entirely it is currently unclear why DirectStorage is not using it.
So if they were to use technology to bypass the CPU, would that mean that CPU bottleneck is eliminated when upgrading a GPU?
GPU upgrade would not matter, unless the current GPU lacks required features forcing the data into the CPU for processing as a work around.
The main improvement over the current direct storage implementation would be a saving of some latency and memory (or cache) bandwidth. The data would be moved from the NVMe SSD straight to the GPU VRAM via the PCIe controller cutting out CPU caches and memory entirely. This is apparently what consoles like the PS4 use and does exist in the data centre with a proprietary API for Nvidia enterprise GPUs.
@@drsupergood8978 oh wow thats very interesting to hear… but wouldn't that crash the GPU market? I mean if it doesn't matter then how will companies compete? There wont be a need for buying new GPUs because they wouldn't speed up the process anyways. Maybe if they break due to aging or manufacturing issues. If thats the case, then that the only way i see companies profiting off of this is if they force us to buy new one by ruining the GPUs in software, hardware or maybe monthly payments in order to use the technology. Of course future upgrades that unlock new features would be a thing and maybe thats where we will want to upgrade. But im sure that im wrong somehere (im also just a child so for anyone reading this, im just saying whats on my mind and don't take anything from this, im sure there are solutions and answers to my questions)
@@matyaskatocz9323 Why would the GPU Market crash? We're only talking about storage en transferring assets here. It has nothing to do with rendering power and how much FPS/Details a GPU can render. Rendering power will still be something that will grow over time and this is where GPUs will continue to compete.
@@rba42 oooooooooh right! I had it all confused
ssd is anything but humble, it's about damn time that other systems are using it to it's full advantage
TBH its value has been understated for a long time.
@@neattricks7678 In respect yeah but it's value in cost was a lot 4 years ago. They got so common only because price lowered.
@@prateekpanwar646 They are so common now that I have 2. I have a regular SSD and an m.2, because why not. One is mainly for my OS, the other is dedicated to games I'm playing.
other systems? pc doesnt utilize ssd in games to its full advantage while consoles can do that.
Hey guys, what did u think about get SSD SATA instead NVME M.2? (Considering DIRECT STORAGE in future)
This was really informative... It actually makes me understand why Sony has licensed Oodle Texture technolog (Oodle & Kraken Decoder for the PS5.)
Yeah, there are so many bad takes in the comments section, so I'm glad someone gets it. Btw, this was all explained very clearly in Sony's "The Road to PS5" video two years ago.
HEY I'VE GOT AN IDEA:
How about, we sell games on their own SSD?
And since SSDs are fragile chips, we could enclose them in their own containers. And put labels on them for what game it is!
We've come full circle!!!!!!!
and we'll have to blow on them sometimes if the connection is janky
Oh shit
In the form factor of a Sega Card.
Won't be possible considering the business model a lot of companies are following nowadays, selling incomplete games and promising future updates
With the silicon issue now no way
I want to see how DirectStorage will affect load times between various SSDs like comparing a TLC NVMe SSD and an Intel P5800X.
It didn't just affect load times, some games that doesn't use all the gpu resources because of having too many assets to load and causes a cpu bottleneck will have a huge improvement in performance.
im guessing it will scale with bandwidth of the SSD. so m.2 will perform better than sata SSDs. i could be wrong though. it could also scale directly with RAM frequency and be less effected by latency?? i dunno. just brainstorming here
I would think this kind of tech will only really be a thing with nvme tbh.
@@Cheeseypoofs85 From Microsoft's developer blog directly: DirectStorage is DirectX 12 API for Windows 10 and 11 PC with NVMe storage. Data is loaded to RAM and copied to VRAM without I/O calls involving CPU (GPU was able to read dedicated RAM for decades, now it will have instruments to make I/O call to NVMe storage itself). Purpose of this API is to shift data decompression duty from CPU to GPU to the limit allowed by the hardware and it's firmware (yes some firmware will lock you out on fully capable hardware).
@@Kholaslittlespot1 I guess the real question is will we need something like a PCIe Gen4/Gen5 SSD? Or will there be no notable difference past storing on a 2gb/s Gen3 NVMe drive?
Compression is not just important for reducing installation file sizes. It also speeds up the I/O since reading the data is a bottleneck in the system. If the data is compressed, you have less data to read, so it takes less time. The consoles have hardware decompression built-in so data can be stored compressed and read in that state. The level of compression achieved then becomes a multiplier to the effective I/O read speed.
But not automatically, like you said consoles have hardware compression available, but PCs have to use the CPU, there are a surprising number of people still running
@@rngQ Yep. That's the point. Compression is a huge part of I/O speed and this video relegated it only to reducing installation size. The fact that PCs will have to continue supporting CPUs that can't handle decompression in real time while running the game is the reason why PCs as a whole will still be playing catch up to consoles on this issue, contrary to the title of this video. In fact it will likely be the need of games to also run on a wide rage of PCs that will hold back what can fully be done with the much higher SSD I/O speeds for some time to come.
Same reason file system compression with a decent CPU can greatly improve read/loading speeds. Apparently that may be moving to the GPU sooner than later. Less and less data seems to get processed by the CPU as time goes on and eventually, GPU may just stand for "General Processing Unit"
@@KenMathis1 Kinda. It depends on Nvidia for their GPUs to make something. RTX IO would be the Solution you want here. And that should be out since End of 2021. Since Direct Storage just released and no Game is using it anyways until Forspoken releases… they probably said „Fuck that, we will wait to release it since no one could use that anyways“.
I assume Nvidia will (re) announce it when they show the new 40 Series Cards.
Also wonder what AMD will say to this. They are working with Xbox and PS together and know how they do it but they never mentioned it til now.
@@rngQ though some compression can be incredibly efficient even on low end cpus, for example windows's built in compact.exe with lzx has no perf impact even on a lowly i5 2400, and in some games it can go down to like 20% of its original size
There was a time in decades past where it was expected that RAM and storage would soon merge; your storage would be a non-volatile RAM that worked at high speed. Sort of like the old RAM-disk utilities. You wouldn't "load" a program into memory, just access the memory where you installed it. I would have hoped that this would have come to pass, but not quite yet.
There are a few SSDs that can match RAM speeds but it's usually an expensive proposition (Optane and/or certain raid setups with higher end SSDs) and with DDR5 just now coming out it seems that RAM will soon be much faster than storage again in the near future. So long as RAM has a significant speed advantage it's unlikely we'll ever stop using it. I remember hearing about some technologies that were in the works years ago that would supposedly give us Storage with equal speed to RAM using spintronics or carbon nano-tubes but I've not heard anything of those technologies in years suggesting they hit a roadblock they simply couldn't overcome. If we do ever get such hardware capabilities working and affordable enough then we'd still have do so some major work on the software front to take advantage of it or really just to use such a system at all (at first they'd probably have some section of storage emulating RAM for non-aware software but the better solution would be to reinterpret the code or perhaps recompile it to use the new tech).
they can build it but its impossible, because the number of write we can write in SSD, even SLC wont even close to the durability of Volatile Memory
@@syarifairlangga4608 The carbon nanotube storage I mentioned would theoretically solve the issue of durability. The basic idea as I understood and remember it (this was back when Optane was first announced) was to have cells composed of a tiny carbon particle inside a tube that would switch between 2 positions depending on the value. In theory such a system would be far more durable or at least far easier to repair since the memory modules themselves would be unlikely to fail. I presume the road block was either that it was too difficult to manufacturer or they could get them to be fast enough (if memory serves they could theoretically switch faster than Optane but theory and practice often differ).
DFM?
I remember reading about a non-volatile RAM replacement years ago, but can't remember if it was DFM or something else. Just sad these technologies get announced then.. never seem to come out, or get released to the consumer market.
@@grn1 Optane's PCIe latency is still ~1000x slower (on the order of tens of microseconds) compared to low-end DDR4 (tens of nanoseconds). And SSDs with RAID are a joke compared to Optane, never mind RAM. Optane Persistent Memory modules are comparatively faster, but still 10 times slower than DDR4.
With Optane's development now halted, it seems unlikely we'll get to the great merge anytime soon.
Just taking a moment to appreciate how good he is as a presenter.
Anthony is LInus medium group's best presenter, bar none. Bravo.
Yes, I hear it every fucking time!
He could get a better haircut though.
Yeah, he's come a long way since his first videos. His scripted stuff comes across as pretty natural
Yeah, he's by far the best on LTT. I just worry he'll die from cardiac arrest soon.
I really want to see how this affects CPU usage, as well as how much it improves loading times on lower end CPUs.
The CPU is still very much involved. The video is misrepresenting quite a bit.
@@ceebee It is stated in the video that its still involved, and that it is an issue that may be solved in the future. The question is how much, and the only way to get a definitive answer is to test it.
It likely won't have much of an impact on load times in practice. It is already just an edge case that the cpu file copying/decompression during load tkmes is the speed limiter.
And it should be even less difference in % gain between high and low end CPU's. Keep in mind that you are unlikely to e.g. combine a 3080 12GB with a intel celeron or somethin really low end. And lower end Graphics card usualy also come with less VRAM, and thus also less data loading during loading screens.
The actuall rela benefit will be in data streaming, as that happens while the game is actively running and the CPU has plenty to do with that. If your CPU is only running at 20% saving 5% doesn't vhange much. If your CPU is already at 100% and starts having a backlog? Then those 5% make a hughe difference.
@@reappermen Smaller VRAM actually means you have to put more data through it in the modern games streaming an "infinite" world (less VRAM to cache existing data you may return to). E.g. it took me 7 hours to run through the starting island in Elden Ring (is there more islands? I didn't fight & level, just travelled to see about 2/3 of the world freely accesible to a beginner character). That happened without the old-school loading screens.
Also there are already coming modern GPUs - much faster than what you can have in PC - equipped with "Celeron" class CPU. There's no need to pay for a beefy CPU when the GPU can load the data from NVMe itself. When you scale huge server farms, you save a lot of money on the CPU.
@@ladislavzima8382 You are correct on the smaller VRAm cards needing mroe data streaming. My reference for no load time imapcts there was specifically aimed at the OP's mention of loading screens, which do get faster with smaller VRAM because it can only fill the VRAM once during a loading screen, and the less VRAm there is the fast that is (o nthe same bus).
As for there beeing modern GPU's paired with Celerons, that might exist for neural networking or specialised analytics and modelling software on some servers, but those already run their homecooked everything as OS and don't need Direct Storage. For everything gaming related 8which the Op semed to refer to with his post) you absolutely do not want to pair a high end GPU with a cleeron or similar crappy CPUI, as that will massively bottleneck your system no matter whta you play.
And finally, there currently isn't really all that much in the GPU server space that is that much 'better' than a 3090/3090ti when it comes to the GPU alone. most of the gains there is in better Bandwith and MASSIVELY more VRAM. I forgot which card it was, but in fact one of Nvidias current highest end Server GPU runs on a slightly changed 3090 Chip, with like 4 times the VRAM and other fringe benefits attached. But for pure cycles, the servers space isn't mcuh ahead. Server software is usualy made to run on multiple GPU's at once if it's really demanding, so getting more cycles in one card has limtied benefits.
Do remember some of the very large size is because on HDD you'd have multiple copies of the same file so it could load it all as one block when you need it, rather than having to jump around (reducing load times), that should drastically reduce some game sizes if HDD support is dropped/ SSD becomes a requirement (as SSD's have way faster load times for lots of small files since no spindle).
Note as well, the Nvidia GPUDirect storage is also Linux compatible, currently in beta on Ubuntu.
i don't think the ssd requirement will come to fruition in near future. The high capacity ssds still cost a lot more than HDDs
@@rajder656 you can get a 1TB SSD for
@@rajder656 except high capacity is only a huge worry because the games are so big now. if direct storage also means trimming down file size its not really an issue. also I don't know why people think HDDs would just not work when games are developed for direct storage. itll just become another thing that plays into your overall performance. hdds still see a speed improvement with direct storage. additionally methods to optimize performance on old hardware will be around until its irrelevant. like grabbing smaller assets to display the same thing off a hard drive. a low poly version of the models a hard drive might struggle with
@@Masterrunescapeer and 1 tb hard drive is half of that price in fact you can get 8tb hdd for that price
@@demonz9065 i mean if you store only games good for you. And it also only applies to new new games. If i have idk movies or other stuff alongside games then HDDs are better even without those if you want just games HDDs are better right now and for many years after since we won't see widespread usage until 4-5+ years
Been waiting to see content on DirectStorage for a while. Cant wait to see see what it can really do when we finally a get some games that support it.
Great video, but there are some inaccuracies. You don't have to reload an entire atlas to replace one of the sub-textures. Rather, render the "new" texture into a spot in the atlas to replace one of its components. It's not intuitive, but it's not complicated. Atlases are intended to minimize thrashing when swapping textures, and they work really well. Replacing an element comes with its own cost, but it's not nearly as bad as a round-trip copy. The "blurry lines" between adjacent textures in an atlas are compression and/or sampling artifacts, easily solvable by using lossless compression and properly-selected sampler settings in the shaders. When generating the atlas on the GPU, those artifacts do not exist.
that is one technique, that is not always used. Because, in the end, the GPU needs to wait for the new changes in the atlas to be made, creating idle time/bottleneck. That's why, in a coding standpoint, just sending a new atlas have the same result in the end.
@@akiraic Copy and render happen simultaneously. Most asset streaming techniques rely on this.
I remember how much Playstation focused on this feature for their PS5 so it's nice to see that in a couple years this console selling feature will be implemented on a platform with a larger number of titles & greater control over system settings!
Still no real PS5 game that use its full potential. Loading times on pc with SSDs and no direct storage are similar to PS5 with it......
@@samfkt it's coming
And so the ps5 continue it's elusiveness till another year if scalper take a lot.... sigh.....
@@chillnspace777 yes it means PC will be more and more fasters than console.
@@samfkt thats because consoles don't have the single thread performance that PC does. Traditional loading on PC with a fast CPU and a slower SSD will match the consoles with the fancy APIs. Even Direct Storage doesn't remove the CPU advantage, only helps lower it somewhat.
I'm curious whether this could be used for render engines. On of the bigger problems for small-medium sized VFX studios is that they'll use GPU rendering, so often (particular with larger or more complex renders, they're limited by VRAM, even with a 3090.
Being able to chuck 1TB of super fast tertiary storage, could allow much larger renders on cards that don't have enough VRAM, although likely at a fairly significant performance overhead. However 'slow but available' beats unavailable 100% of the time when it's needed.
I think they just did a video about a Radeon GPU that had SSD slots on the card for this exact purpose...I don't think it was very good though, I can't remember...
@@brucepreston3927 iirc it worked but has a pretty specialized api that wasn't picked up by many major softwares
Theres no reason a fetch over PCIe form the SSD would be faster than a fetch over PCIe from the (much faster) memory controller. Though it would certainly be cheaper, youre probably still better off caching it in system RAM.
@@mycosys I agree. Even super-fast NVMe SSDs usually top out at maybe 2-3000Mb/sec, which is around a tenth of the bandwidth of a single DDR4-3200 stick--and once you have dual channel RAM (and what gamer doesn't?) the discrepancy gets even greater.
True
Love all the new videos with Anthony!
Heck yeah, give Anthony more screen time!
I’m curious to see if more games support DirectStorage in the future. I know Forspoken will, but I hope the likes of Street Fighter 6 follow suit.
Initially I thought it was weird that you brought up Street fighter because I would think that a fighting game would have one fixed set of assets for an entire match and not need to load anything during gameplay, but then I remember how nice and snappy it was to start matches on old arcade games where everything was in ROM, you could be fighting in under a few seconds. A very nice user experience.
SF6 of all games, I'm down though
@@AlRoderick let's be honest with ssds it would take seconds to load anyway
@@ItsAkile It’s more about fast loading, as that’s been an issue with some recent fighting games. Tekken 7 in particular was awful about loading, even on PC.
@@Neoxon619 early Unreal Engine 4 pains, recent titles much better though. but I get what youre saying, these games should load fast
I think one of the first games to stream in assets was 1999's Soul Reaver on the PS1, so the industry was moving towards that kind of game design even before the turn of the century.
Good game
The Crash games for PS1 streamed data off the disc, and Andy Gavin, the lead programmer behind the Crash games has a fantastic dev blog that goes into some of the details. It was groundbreaking stuff at the time, and Sony was worried because they never built the PS1 with that amount of constant disc reads in mind.
That's eventually how most big games began to function on DVD based consoles. You can't fit entire big levels in the extremely tiny amount of ram on a PS2, or even the PS3. Crash might have been one of the first to do it, but there is a reason why all PS2 and PS3 DVD based games had to constantly, literally non-stop, read the disc.
I pretty much like every video where Anthony is the presenter before I even watch the video now.
You win, Anthony.
I'm always happy when there is some new tech that Anthony gets to do a video on. Just has a way of making things easy to digest and understand. So credit both to Anthony and the writing team.
I love this. I feel like I need to note though, just because hardware is faster shouldn't be an excuse for Devs to cheap out on optimisation. Maybe for demos as proof of concept yeah, but please not for final release.
Yea I also can see unity and unreal engine devs be like okay so we put uncompressed files to make load speed faster, right? Oh wait what is this... a 400 gb folder?
Also imagine if you're not running SSD. You're IO performance will be killed with these giant uncompressed files that will just eat whole throughput.
That might be the reason that DirectStorageAPI will be a niche thing. Because consoles are very cool in a way that you can do all of the tricks with the hardware because you're sure that EVERYBODY will have same hardware.
they do because it gets the tech industry more money by forcing costumers to buy newer hardware
@@Ubreakable-lr2dk you know game devs don't own hardware companies right? They don't benefit from hardware sales
@@electrosssnake1036 It wont be a niche thing because its what developers want
Not even a minute in and you already know that Anthony will once again give insightful foresight presented in an entertaining manner, easy to follow even for people like me that don't have a clue.
This begs the question though, were the test results valid? You compared compressed vs uncompressed from the looks. This draws into question the validity of the results based on the data given. It is entirely possible it is still a mostly valid result if the compression saved enough time in disk reads that the decompressor was not adding time, but that was not shown.
the uncompressed is loading faster. thats the important thing. usually compressed assets would load faster
You seem to be missing the point. PCs lack the built-in hardware decompression consoles have, so while compressed data doesn't hurt them and basically just increases their effective I/O, the decompression on the cpu has become a serious bottleneck for PC ports. So uncompressed vs. compressed is exactly what this is all about.
GPU decompression is going to come but hasn't been implemented yet and is therefore unlikely a part of this demo.
Games that use direct storage on PC will most likely have even larger installation sizes as a result, but I'd rather add another $200 ssd than buy an entire new system with higher ipc processor and a 12+GB gpu just to handle the compressed data streams optimized for consoles. Because that's what we've been forced to do until now.
A game stuttering on PC while running perfectly on a console with much lower specs has never been anything else than the PC choking on a compressed data stream it wasn't designed to handle. Optimizing a PC port for the most part means changing compression algorithms, package sizes and the overall data streaming behavior of the engine ... so devs should have it a lot easier in the future if optimizing a title for our platform simple means storing assets uncompressed or with lighter compression the GPU can chew through in an instant. With the possible transfer rates of PCIe4.0, the already finalized PCIe5.0 and DDR5x/6/6x memory we are way better equipped to handle that than what we've had to deal with in the past.
@@protator Thanks for your insightful comment mate. So as of know we are waiting for MS to release the second part (gpu decompression). How is this on the Linus side for the steam Deck etc.
@@1989ElLoco Haha, thx for the compliment, but though I have been doing some 3d modeling for games in the past I'm basically just another enthusiast who pieces together bits of info from multiple sources. So take my comments should you run across them in the future with the obligatory grain of salt. Either the final or future implementations of this tech will use some form of data compression as well, just in a form that makes it far quicker on a gpu than it used to be using the cpu, bringing consoles and PCs very close together in terms of how they handle asset streaming. What I said before is just my takeaway from what I've read so far and what I assume is showcased in this demo. Well, Linus will probably go all out testing this, but you obviously meant Linux^^. Unfortunately I'm a windows-only guy - although that could always change. The steam deck uses a moderately clocked APU with 16GB shared memory iirc, so the concept doesn't apply 1:1 since you need to shove that data through the cpu and into system memory anyway, even if you can skip the compute intensive decompression. But if there's still some decompression work to be done then at least the cpu will be spared from it and it'll be done on iGPU cores which are still more efficient in that case. With DirectStorage the CPU acts mostly as a bus controller while the gpu exchanges data sets with vram and nvme storage directly, so memory management might be the deciding factor as to how much of the benefits you get with and APU that has tighter memory constraints compared to a desktop pc with dedicated graphics where you simply have more headroom to brute-force things. On the other hand Linux has always been very good in terms of overhead and latency compared to the windows kernel, so I am convinced that steam deck users will see meaningful performance improvements once this gets implemented.
The results still seem valid in a sense. On consoles the difference is quite huge, but the real world performance we have seen so far has been pretty much useless, but hopefully fixing the compression issues gets us closer to those console loading times.
Linux has had the possibility for this for a long time. The question is just if Proton can translate the DirectStorage API in an effective way.
PCs have had the capability to do PCI bus master transfers like this since as long as PCI has existed. And with DMA transfers havent been going thru the CPU since the original PC, cos DMA
Look up io_uring - many of DirectStorage’s ideas come from there, and the main concept of the API (reducing the cost of IO calls) translates pretty much the same into DirectStorage calls. Decompression (when it arrives) will be more complicated, but the base API is already there.
Linux has non functional Nvidia drivers
@@timothygibney159 what's that got to do with any of this?
@@timothygibney159 damn i've been doing deep learning, media streaming and gaming for years with non functional nvidia drivers, thanks for info.
I'm pretty hyped for DirectStorage moving forward. I don't think anyone will miss the days of needing loading screens, so I hope and think that it will just be integrated into all modern engines eventually.
It will also be very interesting to see how this can be leveraged for non-game apps that need a ton of data-throughput, like video editing for example. That could be a serious game-changer.
Combine tech like this with stuff like UE5s Nanite mesh handler and we're going to see a huge jump in visual fidelity, too. Closer to photo realism than ever.
08:09 Also, Consoles usually have shared memory, ie.. Cpu and Gpu share the same memory space, so the data is only loaded once
So it's already easier to implement there
It could be a case of simply passing a pointer rather than moving data. The massive separate VRAM in a GPU is a bit of an anomaly. In home computers of the 1980's the video chip was connected to the same RAM space as the CPU. If the VRAM could be directly addressed by the CPU that could be a huge help. Maybe PCIe gen5 will do this. That's another thing, in 1980's microprocessors everything was on the same bus. IO space and memory space were all the same, just different addresses. Very Alan Turin.
Sadfly the vid is really premature and none of what he talked about has even been implemented. The initial API just provides vastly lower IO overhead on these streaming calls
OMG. That thumbnail. Anthony, I love you.
All of us love when Anthony is happy.
I don’t often watch LTT videos but Anthony is killing it and definitely seems much more comfortable.
While compression streeming have been around pretty much since the 90tys. There have really been very few games that used it. Granted, most game have suported them.... but a lot of them, i would say most, have had it turned of as default and hidden it somewhere in the menues.
It's not 90tys it's '90s
Nice English
@@shaunware1232 I find that strange to be honest, feels more natural to write 90's, even though I know '90s is the correct way.
@@shaunware1232 you noticed 90tys but missed every other mistake?
So have Direct Memory Access and Bus Mastering - which he completely failed to mention. The transfers havent gone through the CPU cores literally since the first IBM PC. This adds a layer of SOFTWARE primarily, an API to let software tell the OS to use a bus master transfer from the storage to the GPU. It has been possible (just not practical) since the dawn of PCs.
I literally asked this question about 5 years ago when I was first introduced to m.2 slots... I asked when are they going to allow GPU to directly connect to the PCIe lanes of the m.2 slots instead of going out of the way and then stuff like compression to help.... thanks for the content!
Turns out they still havent - everything teh video discusses is still future plans for DirectStorage :(
If game's increasingly STREAM assets from an SSD in the future does that mean VIDEO memory requirements will drastically slow down?
I've seen demos for DirectStorage that used FAR less video memory to produce the same scene as normal methods.
Probably
Yes, absolutely! Being able to load assets as needed over the PCIe bus instead of prefetching assets will definitely reduce how much data needs to be sitting in VRAM for a scene to be rendered quickly
Idk about that but at most it'll probably be stagnant. But maybe they'll focus on speed more in the future. Nvidia did prove that speed is extremely important. Who knows we might even go back to HBM2 in the future
not enormously - they still need to hold all the textures for the scene being rendered - it would save main system RAM (because you dont need to copy all the stuff there first and probably cache it there) though
8gbs should still be the bare minimum for 1080p gaming. 12gbs would be enough for 1440p and 16gbs for 4K.
ps2 had a 4MB frame buffer btw, which is like 4MB of Vram. It was still able to beautifully render games like Silent Hill 3 (Within its resolution limitations.) Let that sink in.
i am truly impressed by some of the games comeing out recently, i remember the days when we woundered if they would look as good as the movies CGI and now they look as good and in some cases better.
gameing or vertual enviroments have so much potential as long as the devs are willing to think outside the box.
i think we are just getting started with where games can transport us to and its only being held back by the industry playing it safe.
FYI: SATA SSDs will also see a decent improvement in speed, at least, according to forspoken's benchmarks with and without directstorage. NVMe SSDs will see a much bigger improvement because the overall drive speed itself is much higher.
This is pretty surprising as directstorage was always something the was said to _require_ an NVMe SSD to function at all, so not sure why they were saying that when SATA SSDs also see noticeable improvements (Around 20% in forspoken)
yeah idk why people would think an SSD was a necessity. the benefit comes in removing steps required to get an asset from the drive to the GPU. those steps arent something that only exist in an SSD so why would other storage media have no benefit?
@@demonz9065 I think the reasoning I heard most was that NVMe drives use PCIe lanes instead of the SATA interface, which means they have a direct connection with GPU and RAM whereas SATA has to go through the CPU.
Despite that SATA SSDs still see improvements, so there's definitely more to directstorage than utilizing the NVMe interface. Still not sure why they _required_ NVMe drives when they announced directstorage, when forspoken devs were able to run it from a SATA drive and still see decent improvements.
Nobody said that. Tech media said that. Tech media doesn't read dev blogs. DS works on floppy disk.
@@TexelGuy that is all false. DS takes data from NVME to system RAM and then again to GPU VRAM.
It's not direct over PCI to PCI.
Watch the microsoft video, it has 4k views cleary nobody wanted to be educated.
@@lexsanderz 1. NVMe drives were always implied to be the _only_ way to use directstorage in ALL of the information made available by microsoft online.
2. You seem to be misunderstanding my comment, I never said data can go from NVMe to GPU directly, only that it provided a direct path for it through RAM without going through the CPU like on a SATA drive.
Today, yesterday, every day ever...
Here we are more than a year later, and there is literally only one game that supports it, and it has very mixed reviews.
That picture in the bottom left though. 0:42 😂 meme material?
The first time i remember reading about asset streaming was Crash Bandicoot on PS1. Yes PS1, it's why the levels undulate up and down and you couldn't move the camera. Hiding asset loading. The opposite of that on PS1 was Ridge Racer. That game loaded the whole thing at the start and you could swap out the game for a music CD and keep playing. Im thinking Bethesda has to be in on this for Elder Scrolls 6. Since they are using photogrammetry for assets. Loading them in as you move between exterior cells wouldn't be a problem anymore. Anyone who loaded up the original Skyrim with a tone of mods back in the day can tell ya. It would stutter like crazy if you added to many large texture mods.
Love Anthony's videos! Love learning the "nitty gritty" of why something works, or works better.
Decompression on the GPU sounds really cool. My question is will that require new hardware or can it be added later in software? For example could my GTX 1070 do decompression when that feature is available or will it require a new GPU with a specific chip onboard dedicated for decompression?
I've always heard it called "hardware decompression."
Well, Nvidia allready announced RTX I/O which as you can guess requires a RTX GPU.
I feel that it could be done in Software on older GPUs, but include an instruction set for compression in future hardware revisions. Kind of like how RTX 20x0/30x0 cards can do real-time ray-tracing in hardware, and GTX 10x0 cards can only do it in software.
@@JayrosModShop I mean I doubt nvidia will push such a big driver change including the CUDA version of decompression. It will be a big change that can lead to potentially broken compatibility or just something else buggy. They just won't do it.
@@pixels_per_inch I mean that stuff still looks like rtx thing. Yea it's here and it's cool but not everybody is gonna have it neither it is so important.
such a feature but opensource would be nice and great for the steam deck
Not only for the steam deck, but for linux gamers in general.
The sad reality is that Microsoft does not give a damn about Linux unless it is serving there interests
Funfact: Nvidia had a similar feature (GPUdirect) for professional uses on Linux something like 5 years ago.
@@darin7553 yes like their game streaming thing
@@lennart3100 i am a Linux gamer since 2019 that is why i mentioned the steam deck as Linux gaming
That you to the editor who didn't put in annoying thumping background music, you are awesome unlike some of the other editors who seem to think every video needs to be inside a nightclub
Anthony, thank you for putting in the effort for technical stuff like this.
Wondering if the 2nd render (directstorage) took assets from the SSD's on-board cache (loaded during 1st render).
The directstorage demo read a different file though, so it's kinda improbable that the SSD had that one cached *. Also I trust Anthony to be competent enough that he ran the demos in the opposite order at some point, just to make sure that the caching isn't massively skewing the results.
* "improbable" because SSD manufacturers may implement heuristics and other magic to help caching and make their SSD look/feel faster. But since the file was like 52MB big, I'd feel that the SSD won't decide to pre-load and cache it, even if some magic software layer concluded "He always loads these two files one shortly after the other.".
@@carpenecopinum1665 I concur 🤝
Fully expecting Microsoft to try and crowbar this into the Linux kernel in order to pull an Embrace, Extend, Extinguish
But wouldn’t that be good for Linux? What if they made an open source version of direct storage?
@@Flameancerit's worth understanding the difference between "open source" and "proprietary". Teslas designs are open source and proprietary, which means they can go after anyone that makes anything like a Tesla because they can say you copied them.
Linux is open source & GNU licensed which means you can modify & copy to your hearts content.
It's worth reading up on Embrace, Extend, Extinguish directly, I will suck at explaining it because it's like the financial system; deliberately hard to look into
The Extend, Extinguish aspect of what Microsoft does can include several strategies such as:
a) Having an open source nonproprietary version that gets less love than the main one. This is seen in the Open Document Format that Microsoft got forced to use, technically you should be able to use any Microsoft-software made document on any other editor easily but they set about specifically adding functions that competitors couldn't use or replicate easily to make interoperability harder or impossible. You end up with two standards, the official one and the proprietary one at the same time.
b) They bloat the standard directly. Assuming it becomes a nonproprietary standard they can have it get bigger, more demanding, harder to develop on until nobody wants to use it, it slows systems down, or slows development down. They can then have the good version they keep internally, rename it something like "Xbox Microsoft Ultimate Gold Windows Fast Loading System", and release it as proprietary in their products.
c) Linked to b they can also either use industry links to have hardware that prefers their proprietary standard released or they can include code in their own systems that skips the issues. nVidia does this with their drivers; they include code their old hardware has to run in addition to its normal job that newer hardware will skip over
The groundwork to make this possible was laid all the way back in 2016. It is possible to export an OpenGL or Vulkan buffer object (with GL_EXT_memory_object_fd & VK_KHR_external_memory_fd respectively) as a dma_buf file descriptor.
What is left to do is to allow syscalls like "copy_file_range" or "sendfile" to initiate DMA transfers to the GPU buffer from userspace; a thing they can't do yet (only fs-to-fs or fs-to-pipe/socket copies), but in theory is possible. Slap that in an io_uring OR poll the dma_buf object and you've got a DirectStorage equivalent.
This is literally the same thing as the sega genesis "blast processing", an DMA circuit that can copy data from the storage to the video memory a lot faster than the CPU can.
The SNES had DMA too, but it just wasn't part of their advertising campaign.
The IBM PC had DMA. Whats worse this video is REALLY premature and DrectStroage doesnt have DMA or GPU decompression yet. ATM its an API that improves IO call overhead.
@@mycosys Sadly they didn't evolved this side as much as they should, otherwise well, we wouldn't quite need direct storage.
It would be pretty cool if PCs could do timed DMA transfers etc..
"Blast Processing" was 100% a marketing gimmick. Some marketer overheard discussion of a bug in the VDP, during conversation the words "blast" and "processing" came up, so they just put them together for marketing. It is a real technique, though very obscure, not used by any commercial games, and was only figured out to be viable a few years ago. The fact that it even works is pure luck. It allows using more colors than the color palette allows by abusing a bug, advanced manipulation of the CPU to sync with CRT, and consuming 100% of the CPU, so isn't really feasible in gameplay.
The SNES also had DMA, and was used a lot. The genesis was faster CPU, though the SNES had an advantage with Mode 7.
@@xerideaOf course it's a marketing ploy. also i think, not 100% sure that the sega genesis can actually copy data outside of the vblank time, which allows it to move a lot more tile data per frame than the snes, which was abused by games like sonic etc..
This would be great for megatexturing. You could have a cache of texture tiles on the onboard SSD. So basically two levels of megatexturing. Basically, the GPU could use megatexturing and the SSD could use clip mapping. Megatexturing is using the exact tiles that are seen on-screen. Clip mapping is using loading the highest resolution tiles for those that are near the camera and gradually load coarser resolution tiles the further away they are. This used to be limited when using the GPU. But with an SSD, you could prefetch a lot more data and significantly reduce texture popping when using magatextures. In fact, only far away coarse textures would be prone to texture popping and since they're already coarse, the effect would be far less noticeable if it happens at all.
You could do the same for 3D assets and load higher resolution meshes for nearby objects and lower resolution the further away they are with as much overlap as possible for multiple LODs.
Been banging on about this for a while. Can't wait to see it all come to fruition.
"...is loading nearly THREE TIMES FASER."
0.33 / 0.08 = 4.125
I mean, in the grand sceme of real numbers, 3 and 4 are quite close but woud "OVER 4 TIMES FASTER" sot have sounded even better?
Yeah, I was like "wait did my quick maths skill failed me?"
Congratulations 🎉🎈 you’re the 500th person to comment this!
I'm curious to see how difficult this will be to implement in compatibility layers like Proton for Linux/SteamDeck. I don't trust MS to make things easy now that SteamDeck has proven to be so popular.
To quote Microsoft: "Microsoft Loves Linux"
May 6, 2015 (Can't post the link, because it gets deleted, but you can find it easy)
Guess it's time to stand by their word and take action with that statement. Especially because hardware-wise the SteamDeck already is the console-like platform that's also a PC, that gets mentioned in the video at 8:15.
In all honesty I don't think Linux competes directly with Microsoft, it is just not convenient enough to be used by normal people for everyday use
Word on the street is that the DirectStorage .dll files already work in Wine, no additional effort required.
I wonder if HMM (Heterogeneous Memory Management, wich does already exists in Linux for ages -2017- and permit, if I'm not wrong using GPU memory "as any RAM") can not help here, or if does solve only a part of the problem (because I don't understand everything but I -think- it will only bypass the RAM storage, but not the fact the CPU still needs to be involved)
@@potens1 Direct Memory Access or DMA has been used by network cards for decades. The device is configured to put the data directly into RAM and then ping the CPU when it's done. I don't know the specifics but the same principle should allow storage to put things in VRAM. The CPU could issue a command to the storage which would then get the task done and ping the CPU when done.
I really hope this also helps with frametimes. Im so tired of modern titles having horrible stuttering issues. Really breaks immersion and sometimes hinders gameplay. Maybe with faster asset streaming, we can not only see faster load times but also smoother gameplay.
ekhem elden ring ekhem
this is something to do with dx12 not ssd. we literally had great frametimes all generation until that piece of shit came out
@@Optim121 It's problem in dx11 games too. It's an issue of shitty game engines that are designed for ease of development and not performance.
@@rtyzxc but MUCH less than dx12 ones, to be honest i can't remember any dx12 title that doesn't have heavy stuttering problems (other than fh5)
@@motorolah8107 FH5 doesn't have heavy stuttering? That's news to me. 80 hours in-game and I'm finally getting tired of how bad of a game it is.
It's probably going to take at least a few years before we start seeing this in all new release games on PC.
so glad you showed "unreal tournament" i love that game and was really good at it. i won office pro 1997(in 1999) playing that game.
Linus got such a good partner, I enjoy Anthony's energy every video.
Fellow Anthony enjoyer
i was just thinking the same and looking for such a comment. also like that his content is more informative and goes deeper, you actually learn something here.
Reminds me of old days linus is alot more busier now and I'm sure a bit more annoyed, nice breathe of fresh air that reminds me of older days.
This is going to be a game changer, especially on lower end CPU's.
One of the best ways to see the impact of direct storage imo is Ratchet and Clank Rift Apart when going through the rifts, especially the part towards the end of the game when you are quickly going through a sting of rifts bringing you to a bunch of drastically different looking areas. That has been one of the very few moments so far in next gen games that really shows the advantage of next gen hardware. This is why i think Microsoft's decision to keep supporting xbox one with all new games for the foreseeable future is a massive mistake regardless of them claiming last gen isn't holding next gen back, that line is a complete and utter lie, they know it as does anyone who has played a true next gen game. The amazing rift mechanic in the newest Ratchet and Clank game is simply impossible to do without the extremely fast ssd and direct storage, especially when looking as good as it does. Microsoft making all games also work on xbox one prevents them from adding core gameplay changing features like that.
Fast/near instant load times are nice but things like the rift mechanic are where the new direct storage feature truly shines. It allows true open world games that don't require ANY tricks to hide loading screens like mini cutscenes when going inside buildings, you just go right in. I am sure it enable even more really cool things that couldn't be done before.
👏🏻👏🏻👏🏻👏🏻
3:42 Damn that shot brought me back
I am not sure everybody understood that compression part so I'll clarify it a bit. The thing is that there is DXTn compression. Which is just texture compression that can reduce amount of memory the texture takes on HDD(or SSD) AND in VRAM at the same time. So that means that the GPU can use that compressed texture too without decompressing it at all. On the other hand there are very complex compression algorithms that archives use(i.e winrar or 7z). But they are used solely to compress files on HDD, they need to be decompressed to make use of the files. This is what games use to pack their content so games won't take too much space. And that is in my opinion good. Yea you need to wait a bit of them to decompress but if the gpu will be able to handle stuff like lz4 lzma lzma2 bzip that are used as typical archival things to make the gpu load faster with directstorage. Or maybe they'll comeup with new algorithms like DXTn that will allow the GPU to use compressed assets directly(although it's probably not gonna happen for other types. Because usable compressed assets are only textures and it was a long time ago when that was inveted.)
I hope that made it clearer to understand.
I just imagine video games in my head. It's free and the cops can't stop you.
The last 2 generations of consoles (and actually phones) also have unified memory, the gpu uses the same memory as the cpu so there is no need to copy it to vram
i mean apus do that too. I mean apple is using this in m1 macs too
@@rajder656 APUs do not share memory. They simply segment it. You literally lose chunks of system memory, which then has to be copied from that chunk of system RAM to the chunk allocated for the APU
There still has to be some CPU involvement to direct the video card to the blocks it needs from the drive. I doubt the video card understands filesystem structure, which could be FAT32, exFAT, or NTFS at this point.
Most likely, API will take care about that: it will determine blockaddress to read from and pass them to GPU, to initiate exchange.
@@yaroslavpanych2067 but what if the file is defragged? It will read garbage then.
Actually the question of how hardware-deep memory access DirectStorageAPI will use under the hood could actually be a very important performance factor. It'll make huge difference. From my experience from working with DirectAPI's(mostly d3d's) it will have numerous capabilities enumerations that will vary in hardware->software with hardware things being fastest but requiring hardware and software being garbage but working on any system(sorta HAL).
@@electrosssnake1036 dude, data fragmentation and read errors are too completely different issues, drive fragmentation just means the read heads of your hdd have to jump from place to place in order to gather the requested data instead of reading it in one go. Basically random I/O verses sequential read performance. But that's mostly a thing with spinning rust, and this tech is meant for current gen systems with NVMe storage, not 20 year old toasters.
@@protator dude not everybody has ssds. Question is what will happen if you do not have last gen tech. Such as hdd which could be even slower if fragmented.
Correction: smaller textures in an atlas can be tiled - it just takes some extra shader math to tile the specific region of the atlas rather than the whole atlas.
Machine learning will love this. Training on a big dataset can give you lumpy GPU utilisation graphs even with batch streaming, as the data is loaded to CPU/RAM between batches. This is especially bad due to how bad parallelism works in python, but with direct data loading that work can be shifted to a more parallelism friendly API and be faster.
Great job Anthony and gang, you explained this so very well, particularly the industry common practices and remaining challenges. Eager to see what it can do.
I hope you know that GPU and SSD can directly read/write system memory, so the CPU cores were never heavily involved (except decompression). This is DMA (well, it's modern version, where there is a request PCIe packet sent). Moreover, usually, there is no other path from the SSD to the GPU but through the PCIe root complex, which is basically the PCIe controller inside the CPU. The only real thing left out is the system memory; we now have a path SSD->PCIe controller->memory->PCIe controller->GPU, and GPU direct can do SSD->PCIe controller->GPU. (And note that for data centers, NICGPU copy is already a thing since at least 6 years; now we just have SSDGPU in Win.)
I made the same point - then discovered the vid is REALLY premature and the DMA and GPU decompression havent even been implemented yet. ATM its just an API that improves IO call overhead.
Now we just need something like DirectStorage but for Linux…
Lol
Haha no
Given it's a DirectX technology, it'll be the responsibility of Wine/Proton and DXVK... though it might need some kernel level support too?
@@seshpenguin might be!
yea
Seems like the next step could also be GPUs with onboard SSDs with the pipeline eventually becoming; fetch compressed textures, decompress on the GPU, store unused textures onboard, swap textures in and out from the onboard GPU storage.
Like being able to decompress on the card like NVidia has technology to do is already huge...imagine also being able to only have to do the fetch from off card storage and decompression _once_ for _every_ texture because everything can be stored directly on the GPU.
The next big step would be to have GPUs optimize draw calls by using some algorithm to make texture atlases of hot texture paths to really maximize any available on card storage to really fine tune what gets loaded into memory to reduce texture swaps.
That Unreal Tournament map brought back some of the best friend lan moments i have ever hade. We cranked up that AI difficulty to max and that was a difficulty that no other game have ever hade sense. For those who think that games can be difficult now try that and u quickly change your mind. That was the definition of Brutal.
Imagine GTA 5 without 10 min loading screens 😍
The PS5 version is loading in around 20 seconds now iirc (still waaaay too long when you compare it to other first party Sony games that load in less than 3 sec)
@@terogamer345 Are you for real? When i open GTA online it takes between 5-10 mins to actually join a server. us PC users just really get shit on by rockstar xD
Not 3 times faster, in the tool Microsoft provided, but slightly more than 4 times faster. 8 x 3 = 24, 8 x 4 = 32. DirectStorage On = 0.08s, DirectStorage Off = 0.33. So you could load the same thing with Direct Storage 4 times, with time to spare.
Indeed. I noticed the same mistake, and watched the video again to make sure. Anthony is great nonetheless 👏
Margin of error
Thats why he said, clearly, 3 TIMES FASTER, not 3 TIMES AS FAST. Can you spot the difference here?
@@FearInfected When you use "times" it implies multiplying, i.e. 3 times faster mean 3 times the baseline speed, not 3 times the baseline speed in addition to the baseline speed. If you talk to normal people and use this kind of language, they will naturally assume it's a simple multiplier, which is also why nobody uses "1 times faster" because it doesn't make any sense, you say twice as fast/two times faster, those two will be understood as the same thing by basically everybody. If you google "1 times faster" the top results are people trying to justify why "2 times faster" actually means 3 times by using "1 times faster", in real life "1 times faster" isn't used because it's obviously ambiguous.
And besides, he clearly got the math wrong anyway, as he said "nearly 3 times" while in fact it was slightly greater than 4 times faster, I would guess the script may have been written with different numbers and wasn't updated or something like that, because the math is wrong no matter which way you slice it.
@@joshuacheung6518 I would beg to differ. At least in gaming and performance-related tasks, it makes a big difference.
Let's hope that Proton on Linux will be able to translate DirectStorage calls into native equivalents or Linux gamers (Steam Deck anyone?) could be in for an increasingly rough time w.r.t. new game releases. I'm surprised Anthony didn't at least touch on this subject, especially since his boss is spending a month exclusively playing games on the Steam Deck!
I knew this already but I came for. Anthony Thanks for another great show!
Seeing Unreal Tournament 99 at 3:40 gives the greatest nostalgic feeling.
Direct Storage : Comes onto PCs
Rockstar Games : Casually re releasing GTA V for PC , saying "With faster loading Screens".
(If you don't get it, it's cuz they said " Seamless character switching " when advertising Expanded and Enhanced, but it was all because of the console's hardware)
GTA V had that slow loading issue on the PC for years until last year when someone hacked one of their DLLs and fixed it.
5:30 atlas textures "can't be tiled". No, they can be tiled, with the use of mip mapping. In fact if done correctly you can create mipmaps per tile in the atlas, thus if you have seamless textures in the atlas with the use of mipmapping. 2D and 3D games have been making use of it for ages. The problem is selecting the right LOD for the mipmapping, making sure the mipmap resolution is correct for the atlas.However the mipmap creates progressively lower resolution pixels around the edges of each tile in the atlas, which can result in a texture "bleed" but this can be compensated with using supersampling/multisampling which can help in creating sharper mipmaps. You can swap out tiles in a atlas texture in realtime. :)
The drawback however is you can't set per tile mipmapping, so one mipmapping level across the entire atlast. Additionally if you don't make use of a tile based atlas, you can end up with a lot of unused space in the texture map. You guys have been delving into game development aspects a lot lately and been making a hash of it so far.
well 8*4 is 32, so over 4 times faster right. or am I missing something? 2:50
I was thinking the same thing
My exact thought's too. What are we missing or is the maths broken?
Very well explained, i don't think anyone could have done it better 👍👍
Texture atlases are still a thing for particual use cases, usually enviroments and props. Even today we game devs struggle with drawcalls on lower end platforms like switch and today even PS4. Khm..Cyberpunk..Khm
*Today, PC Gaming catches up to Consoles*
r/pcmasterrace: *sweating*
Nobody is sweating because after all PCs are superior and everyone knows that it’s just a fact and not even in terms of loading times the ps5 is superior because direct storage only reduces the loading time from 2,1 seconds to 1,9 seconds on a m2 ssd thats only 10% shorter PCs have a shorter loading time without direct storage than the ps5 does with direct storage
@@Stiegelzeine tru,is hould say OP means Consoles that sweating instead
altough for gaming only console is enough while PC Can multitask
@Transistor Jump they actually do because the most used consoles are still the last gen and the most used gpu is the 1060 wich is comparable to a ps5
@Transistor Jump keep dreaming kid
Can't wait for the 4 games that support it every year.
So basically, it won't be common until 2030, when DX14 is out on Windows 12... because that seems to be the way all new technology goes, especially when it's exclusive to a single API on a single operating system (hi DX12). Doesn't matter if it's easy for a single developer to implement (like AMD's FSR), publishing companies (especially JP devs on PC) don't operate logically.
Source: tRuSt Me BrO
Gotta hand it to the PS5 for having thought this through and accelerated encode/decode into unified GPU space from its conception, DirectStorage still isn't matching every feature (GPU acceleration) but will get there.
Microsoft did too on series x I believe, they use a extra gpu cu for it rather than a custom chip.
Such an awesome and informative video!
I love tech pieces with Anthony lile this!
I don't understand Lol
Has PC gaming not always been ahead
Pc is always ahead when it comes to raw performance, but when it comes to price per performance console is undoubtedly better.
console has faster loading with direct storage and an chip on the motherboard to even speed it up even more, pc doesnt have that 1 sec load time
@@Ghost-hs6qqthat's true, the only reason I'm considering a console is because of the exclusives.
@@tomodaniel6471cool, and that's coming to pc now LFG
@@Ghost-hs6qq that's like saying piss is better for fuel than gasoline for fuel when it comes to price per performance
Keep up the great content
And still takes 2 minutes to get into the game start screen because of stupid intros and advertising splash screens about different tech/companies used in the game. I feel most gamers would feel a real benefit if games would just play them once (or not at all!) and then turn them off. I don't need to know every single time I boot up the game that you made the frigging game, I know already I'm playing the stupid thing.
Another thing this could benefit is eGPUs -- build an eGPU box with a PCIe switch between a GPU, nvme, and the TB3/USB4 and you've just eliminated 90% of the traffic (textures, and most geometry) across the wire -- suddenly that 40gb/s is more than enough bandwidth for sending drawcalls *and* returning the framebuffer.
No more eGPU bandwidth penalty, which would make putting top-end GPUs in them practical; plenty of bandwidth for framebuffer return (another small current-day penalty) makes eGPU a better fit for laptop users on-the go, too. Actually, that last part is just great across the board: it transforms the eGPU from something you plug into (like a dock) into something that you plug into your system when you need/want the capability (more like a flash drive.)
An enterprising GPU manufacturer could build a PCIe switch and m.2 slot into a GPU today (like those Radeon SSG workstation cards), but no incentive until the software is there (and I agree that the compression situation needs to be resolved before that will happen).
i realy like the calm way how anthony presents topics. even a noob like me can get the gist of it.
big-ups to anthony.
Is it time to upgrade to windows 11?
no
Consoles right now, are better than just buying a Pc if you only want to game, due to how cheap they are
Edit: To all the people in my replies, where I am I can manage to get a new gen console for MSRP sometimes, but not GPUs.
gaming laptops are also quite good and powerful today
honestly, just get a cheap gpu and build a ryzen pc and ure far better off then with a console lol
oh yeah? where am I able to snag a ps5 for cheap? last i saw they were still scalped as hell.
Cheap?
Maybe where you live, hope no group of foreigners rush there and wipe them all out then.
Cheaper doesn't equal better. No matter how expensive building a pc is, it's still always going to be better than console gaming.
So infuriating that we had to wait for consoles to implement faster storage access so that we could make proper use of the potential speeds on our PCs.
That's always been the case in games in all but niche PC-exclusives. The lowest common denominator gets developed for.
I'm personally really glad it's finally happened though, that and that console processing power has taken such a huge leap, we might get to see games doing CPU-intensive AI, physics & weather stuff again.
You can see it both ways. Consoles are also the reason why this kind of innovations become mainstream in the videogame industry. If the consoles weren't limited in some way in the first place, game devs would throw hardware at the problem without a second thought like they always do.
In this very same video, Anthony explained data stream became popular mainly because of the limited VRAM in consoles and that untapped the possibility of creating really huge levels that weren't possible even on PC.
Also, though DirectStorage is great, consoles still have the upper hand, being that they have shared memory and dedicated decompressors (Anthony also talked about that). Go see Mark Cerny's PS5 presentation. It really helps to be able to design both the hardware and the software instead of building things around a generic architecture.
2:58: Numbers: "it's 4 times faster"
Anthony: "Nearly 3 times faster"
been waiting for this for tensorflow
Those movements in the scene definitelly gave me some photo epilepsy headache. Nice
Love seeing videos with Anthony. He is wholesome and knowledgeable.
convinced that thumbnail text choice was purposeful ;)
I love Anthony. I asked innerly: But how does it scale? Anthony, 2 seconds later: But how does it scale?
Great video, learnt something new and got exicted about it, all at the same time! Thank you