+CatnamedMittens Michael Bialas The memory bus is what connects the ram to the cpu,also the bus in general is what connects the different parts in your computer,it's those lines on the motherboard
The cache is the kitchen cupboard, main memory is the corner shop and disk storage is buying on-line at a supermarket. It's a lot quicker to get the cornflakes out of the cupboard than walk to the corner shop for the same item, let along wait a couple of days for them to be delivered.
Modern CPUs often have three caches, the one that is described by this video (separate for data and instructions) is the L1 cache, the fastest, and every core has it's own. L2 is larger often shared between cores (2 on my system) . L3 is the biggest and there is only one for the actual chip on which the cores are.
And we must structure our data access in such a way to use the caches as much as possible and delay the need to eventually go out to main memory, That’s the “fun” part
Just a bit of clarification, nowadays RAM chips can be clocked faster than the CPU, but the latency remains quite high (relatively), so cach memory is here to stay indeed.
cache hits and misses, cache consistency, cache locality, write-through vs. writeback, cache segmentation (as mentioned, instruction and data), multiple cache levels...cache is a moderatly complex and very much researched topic. This however is a great introduction to the basics.
Thank you so much for not saying uuum, uuum, uuum every sentence. Remember you are the result of tens of thousands of years of recent human evolution. Be proud of this.
+Barry Smith I dig their privacy policy. But honestly, at least in my experience Google still provides the best search results by miles. And personally, I don't care about Google making money on me, hey, it's a business after all, nothing more.
You then have different levels of cache. Each level is slower, but it has more space. Modern desktop CPUs have three levels of cache before going to RAM. Some server CPUs have a level 4 cache.
You could have elaborated on what's going on with L1-L3 Caches. Is one for data the other for instructions, or do they have no preferences? If they can/will store the same, how does the CPU decide what to put where as they have different speeds? I think this should be even in a beginner video, because you will be confronted with those values in most ads.
I am confused. Please describe this as an optimisation problem: what are your decision variables, what is the objective function and what algorithm are you going to use to solve it all?
Is there any major difference between the different types of Caches? I have heard of terms L1, L2, and L3, but I'm not sure what the differences between them are significant.
+Karan Naik They're different speed caches. L1 is the fastest, but the most expensive to make, so you tend to have less of it, while L3 is cheaper but slower (but obviously still faster than RAM)
+Karan Naik in a class at the university we compared the speeds and costs of the different storage options (hdd, RAM, L3,L2,L1 etc.) and the differences were just huge. Going from one level to another often meant a speed (and cost) increase of a factor of a hundred or even a thousand. Another very important metric on the low level stuff is the access time; on the L1/L2/L3 Cache on the CPU it is dramatically shorter than on the RAM or your harddisk
+Karan Naik In general, L2 is slower than L1 but has more capacity, L3 is slower and has more capacity than L2. This means that using L1 cache is prefered but a when a page in L1 is invalidated, it's invalidated to L2, L2 to L3, L3 to RAM
The is actually a additional problem apart from cost. Silicon chips can only be made in a certain size, about 300-400mm^2 is tops. A normal computer needs a typically say 4-16GB of ram today, and it would consume over 1000mm^2 of space. This make it so you have to make a multi chip solution. Then of chip memory have to use board wires, they are a lot slower than silicone wires, make it so you need more cache. Actually, the premise that its that DRAM is the reason why cache is needed. But the wires is the issue that make it so we need a lot of cache
I can't wait until memristor technology matures and we don't have to worry about multi-level caching and all of this complication. Having just one big matrix of memristors will enable us to have a flat memory space, all nonvolatile, and eventually the memristors will be able to be dynamically changed from providing data storage to actually performing computation, so even the CPU will just be part of the memristor matrix.
+Ali Can Metan Eeeh. They have similarities in the way they could be implemented in hardware, but their intent is too different to be considered the same thing. Caches can be addressed, registers can't. There is no instruction saying "Register A contains an address to another register. Copy the value in the register whose address is stored in A to register B." I mean... technically it would be possible if the manufacturer put in instructions for that, but you only have a few registers anyways so that would have hardly any use at all. And registers are not used in this kind of fashion. They are used to hold temporary values before you do something with them or write them back into the main memory. Also I'm assuming that with "registers" you mean for example EAX, EBX, ESP, ... on a x86 processor. Because the hardware used to store a bit can be called "register" too, even it if is not such a register in the processor. As far as I know EAX, EBX and so on are called "general purpose registers" and they are typically what you mean when you refer to the registers of a processor.
+Ali Can Metan "I have always thought I was utilizing CPU cache when I used them." That's not necessarily wrong, but most likely you don't. An example: int Cube(int a) { return a * a * a; } In this case the argument a is most likely stored in a register (instead of the stack when there are a lot of arguments). Say that's EBX. The return value shall be stored in ECX. You could do something like this: Mult EAX EBX EBX; Mult ECX EAX EBX; You only access registers. The RAM and Cache are not involved. But sometimes you want to access stuff allocated on the heap: int Foo(int* a, int b) { return a[b]; } Assume a is stored in EBX, b is stored in ECX and the return value shall be stored in EDX. This could be done like this: Mult EAX ECX 4 /*sizeof int*/; Mov EDX [EBX + EAX]; Here you have to access the RAM at a + b * 4. This could be cached to save time next time you access that region of RAM. (I'll assume that whatever is passed as argument a is allocated by using some equivalent of malloc. Because if it was allocated on the stack, caching would not make a lot of sense because the stack changes too much.) The registers in both cases are just used to hold temporary values. The contents of registers are never cached unless you explicitly write them somewhere into the RAM. They don't need to be cached because they are faster than caches anyways.
the point of modern CPU speeds vs RAM speeds forcing the use of more space on the silicon begs the question... what would a 3Ghz, ram stick look like? how much space could a normal sized(lets say DDR2) stick hold? how big would a 8GB stick need to be?
IF it helps think of it like this , if you are doing DIY in your house you don't go and get 1 tool at a time from the shed, you get the whole tool box , you may not need every tool but its much faster than going back and forth.
+ComputerPhile आपकी एनिमेशन बहुत अच्छी होती हैं । आप किस तंत्रांश कि सहायता से इन्हें बनाते हैं ? -------------------- +ComputerPhile Your animation is pretty good. What software do you use to make it ?
I would love if you did a video on if computers are inevitable, let's say all computing technology and everything that stemmed from them was erased, would humans Develop a computer again. What's happened in history for people to first have the idea of a computer and farther and farther back, like what lead to what lead to what, all collaborating technologies , infrastructure, and sciences that spawned the computer. I know it may take a really long episode or a few episodes but it would be really cool. What was square one? What was just before that, that lead to square one. Hope this makes sense.
Gotta love how cache did make a difference. Going back to the Pentium 4 and the Celeron. In many ways they are the same except in cache size. Pretty much not not so succesful P4's became celerons. And for office tasks, light office use, it didn't matter that much. Unless you where dealing with huge sheets to work on. So many office pc's ended up with celerons. And that wasn't bad! It was slower then the P4, definitely once certain jobs really could use a bit more cache then the celeron offered. Of course nowadays cache also is used to connect multiple cores together. As a bridge between them. So you get a nice big swat of memory to function as shared cache for all cores. And locally a much smaller one for its local, one core only, calculations. So does cache influence speed? Yes but it also depends on the job it is doing. Word probably won't get much a speed boost with double the amount of cache. But encoding video, running databases or yes a huge spreadsheet then a big cache will mean more performance.
The memory used to make CPU cache must be very robust, I know that RAM can become worn out over time. What makes CPU cache memory last so much longer than other memory types?
+MrGridStrom I would guess that it has a lot to do with the fact that it's such a small amount of memory compared to DRAM, the CPU generates a lot of heat but modern heat syncs can disperse a lot of it, whereas a lot of ram relies on very small heat sinks so the heat over time may do more to damage the memory in the DRAM sets than in the CPU's cache
It is because CPUs use SRAM(Static Random Access Memory) instead of DRAM(Dynamic Random Access Memory) SRAM is faster than DRAM and also doesn't require refreshing allowing for higher speeds. But it is more transistors than DRAM which is why it isn't used in ram. It also uses more power.
Simplify chips by getting rid of all general cache levels but level 1 and fixed purpose level 0 cache. Reduce instruction size to 16 bit, remove branch prediction, legacy instructions, combine GPU and CPU SIMD as an APU, with many simple cores and increase level 1 cache to fill up the rest of the chip, with some redundancy to improve yields. This architecture makes more sense as we get closer to 1nM chip fabrication. Imagine an APU with 1 GB Level 1 cache the size of a Xeon.
@@CeezGeez .. Intel has everything to gain. I'm one of these people who don't care about Intel's Arc discrete PCIe graphics doing well enough to be competition for AMD and nvidia because I'd rather Intel just pull the rug from under their feet with Intel APUs and an Intel-only mobo standard that has a standardised GPU die socket, like a CPU socket, and possibly a very high bandwidth VRAM socket to match. Faster CPU-GPU i/o, faster data access, with the VRAM usable by the CPU too, though at a slower speed than the GPU. PCIe wastes a lot of space and power as well as potential bandwidth.
+ElagabalusRex As Jan said No. However you can try and optimise compilers so they don't break the cache and stall the pipeline. That's a whole other discussion.
apart from the cost problem, why don't we just use ONLY cache and more of it, either as part of the CPU, or externally. instead of using regular ram. cache would get cheaper anyhow it it were used instead of regular DDR or ECC memory. it may even save motherboard space compared to regular RAM but that's probably not a really severe problem.
+Neil Roy Depends. When you start to watch a video stream, some of it is put in a cache in RAM (and at various points of your connection) before you see anything, to make it more smooth, which increases the delay but makes the experience smoother (since you don't always have to wait for the next video frame to arrive from across the globe). Also, for a CPU or program to look in RAM (depending on if it's software cache or hardware cache) takes some time and might take more time than bypassing cache, for example if you have a high-speed internet connection, it might be faster to request the webpage from the server, rather than have Chrome check a huge cache.
sundhaug92 I recently read an excellent article on optimizing your data in your programs (I'm a programmer) and it just happened to cover CPU caches. They are quite remarkable. I learned the L1, L2 and L3 caches are labeled as such depending on how far away they are from the processor. L1 being the closest and having the fastest times to fetch data, L2 being father away, and L3 the farthest. The amount of time to fetch data from the L3 is quite considerably slower, I was surprised. I guess what happens is when the CPU reads from memory it will store that location in L1 cache. If L1 becomes full, than it will take the memory location stored in it that has been idle the longest (not accessed for the longest time) and move it to L2. The same happens if L2 is full, the oldest gets moved to L3 and eventually the oldest is removed from L3 totally. If no memory is found in any of those caches, it then fetches from RAM and the process of storing the location starts all over again. There is much more too it (memory fragmentation etc... which is what one can avoid by structuring your program properly and speed things up). I recall years ago, in the days of Netscape that website caches basically check the website for the page requested and check the date of that page and compare it to the page (if any) you have stored in cache. If the site's page is newer, it gets the webpage from the website, otherwise it will load in the one from the cache. That is my understanding of this. I imagine something similar is done for images on the webpage as well. With video I think it just loads as much of the video into a buffer as possible then starts playing, usually on UA-cam you can see how much is being loaded ahead of time. Seeing as how videos are sequential in nature, this is probably the simplest form of caching/buffering. The CPU cache being the most involved.
So just a bit of curiosity mixed with some at the moment "current events," with the advent of faster DDR4 memory being capable of speeds of around 3000 MHz (3 GHz) and processor speeds largely staying the same over the last approximate 7-10 years is it possible that cache may become less important of a component, or is it that when you combine the both multi-threaded applications both in multi-core and hyper-threaded applications that the explanation for cache becomes more complex than simply the clock speed of a processor and the speed of the RAM being a major difference between each other as it was in the past?
Joshua Cooper well it would need to do one get data and send it back cycle per instruction. The instructions per clock cycle are nowadays often higher then 1 and all cores would need to do that in serie because ram is mostly not parallel. We would probably need 12 ghz ram before we don't need cache, but then CPU will have a similar speed
Wow... CPUs are far, far less stupid than I thought. I was under the impression that I knew roughly how they worked, but clearly they're far more complex than the theoretical thing I was thinking of...
This is a great explanation of caches but this literally scratches the surface of what a CPU does. But I'm glad you found a new respect for them they are amazing pieces of technology.
Look up technical press releases of Intel chips, there's a lot going on in there nowadays in terms of cache, circuitry, division of labor between/among cores etc.
Umbert Lapagoss I might do, I find this stuff interesting. I'm a C++ developer anyway, so knowing this stuff will help me write faster and more memory-efficient code.
I understand why we need cache but why do we need a centralized cache? Wouldn't if be faster to attach smaller caches to individual cores for doing tasks in parallel?
This would make multi-threaded applications more of a nightmare to design. If Core 1 has some data Core 2 needs for an unrelated routine, it would have to stop what Core 1 is doing to request that item in the cache, or it would have to go out to RAM to get a piece of information the CPU technically already has.
So really, the answer isn't that memory didn't keep up, that technically we can't do it, but simply that the cost of RAM as fast as the CPU combined with the amount of memory modern tasks use left room for a cheaper solution in the form of a small amount of very fast memory, a moderate amount of medium-speed memory (main RAM), and slower mass-storage (HDD, SSD). We could have a machine with 16 gigs of super-fast RAM but the performance increase would be moderate compared to the extreme cost. More abstractly, we've always had cache in the form of a multi-level memory scheme, and this is just further stratification of it. Even the original computers had the online storage of bits it was working on, and humans with the rest of the data, with a slow link between them. The 1980s computers shown had a few levels: CPU with its registers, main memory, and cartridge/tape storage. There's a tradeoff between size of storage and cost of fast access, and many tasks naturally use a small portion at a time, so you use it as cache for the larger one. Looked at from the other direction, it wasn't cache that we added, as the fastest memory kept up with CPUs, and its size has stayed fairly similar (even modern CPUs have the L1 cache around 32K, like home computers had of fast main memory); rather, we've *added* a huge later bank of RAM to modern computers, between the fast memory the CPU uses and the slower mass-storage.
I had a 73 gig cache made by google chrome The way I found out was windows media player said I was out of memory a 120 gig ssd can fill up quickly I had to enable hidden folders to finally find the file why did chrome think a 73 gig cache was necessary
I think you started to explain at the end... what is the L1 and L2 cache? I remember theis was a structure the put on in the late 90s. This is the CPU cache you speak of, and the RAM cache is the instruction memory? Chrome cache just made me type this twice when the vid changed, and is this a failure of the route-a-begga?
+Matthew Gore L1 and L2 are cache levels, L1 being closest to the core and is faster but has less storage than L2. For each layer, there's both instruction and data cache. If you have a CPU with say 3 levels of cache (quite usual) then you can think of RAM as cache level 4 and your harddisk (through the page file) as cache level 5 (though if you have an SSHD, that being a hybrid SSD/HDD, that's level 5 for the SSD and 6 for the HDD). Double-posting isn't necessarily an issue with routing or caches but just one of the hickups that sometimes happens.
sundhaug92 wow, thanks for the reply - quite descriptive, and very appreciated. IMO, hardware architecture is SYSK, I mean we use it every day, and should know the tech diffs between steady state and optical
How does cache work when considering a multi processor architecture? Is the cache shared between cores? What happens when another core tries to access a memory location in RAM that was cached and changed by a different core?
That is like if I asked how gravity works, you answer that there is a certain relation between mass, distance and the gravitational force... Not that informative. The gravity case I'd like to learn the function Fg=G*m1*m2/r^2. In the case of the cache: how and when does it synchronize the cache? How can I be sure that the data is synced when it needs to be?
+Rik Schaaf Some cache-levels are shared and programs can explicitly request that data is flushed to shared memory (L3 or RAM in this case) in addition to being able to signal to other cores "Hey, I just changed that piece of memory that we both might access"
Now that he has done a few videos, he is quite comfortable with the camera and talking to it. Congratulations, it's not something so easy to do. Now he just has to either look at you or look at the camera, try not to look around so much, and they will be perfect! ^^ Shouldn't have to say that the video was very interesting too. :)
+THERESAPARTYINMYHEAD Possibly a fair bit faster for some non-optimized edge-case scenarios, but your computer would be HUGE and probably cost millions. There's only a couple of megabytes of cache memory in a CPU, and the CPU already uses it well. However, cache memory doesn't work the same way as RAM so it might not even be possible, and because of that it also takes up much more physical space.
+THERESAPARTYINMYHEAD You actually get diminishing returns as the size of your cache increases. Think of it as searching for a book in a cart of books you recently read. If you have a huge cart, sure you can keep a lot of recently read books nearby but it will take you longer to find the specific book you need.
Is the gain coming from having to work with small addresses, or something else too? I would imagine there's never more than 24 bits needed to address cache content (not to mention special CPU instructions made to work with even smaller address sizes in exchange for greater speed)
It did seem that my computer got significantly faster after ..well, doubling the frequency, and turning from 1mb of regular L2 cache into 3mb of "intel smart cache" What's the differrence there? Simply that the regular L2 would be split in twain for both of the cores, and that the newer one could use the whole 3 megabytes in both cores at the same time?
+Zandonus Yip. That's all it appears to be. It allows you to not only share data between the two cores, but also have one core have any of the cache that the other core isn't using at the moment. It's like hyperthreading for cache.
I wonder, since RAM has actually caught up with the CPU again where a current generation intel cpu can accept RAM with clock speeds of up to 3GHz without going into overclocking, pretty much the same clock speed as the cpu itself, why is the cache still faster even if the cpu is running at that same 3GHz?
+megaspeed2v2 A huge part of the reason that the cache is faster is that it's physically closer to the CPU. It takes time to get your request all the way out to the normal memory and all the way back to the CPU. I think that is what +sundhaug92 is referring to when he mentions latency.
ZipplyZane yes it was, i understand the latency aspect as it helps when accessing lots of little pieces of data but whenever larger amounts of data are being accessed wouldn't for example 3GHz ram coupled with a 3GHz CPU no longer have a bottleneck as the ram can throw out data quickly enough to use up the entire CPU time in short bursts rather than the cpu having to wait for a few million clock cycles between transfers
I have i7 920 at 2.67GHz, if I upgrade to i7 5960X, and clock it to 2.67GHz, will there be a signnificant difference? And how much does it come from the cache?
For starters, the I7 920 is a quad core 8 Thread CPU where as the I7 5960X is a 8 Core 16 Thread BEAST. Also, the 5960X will have higher instructions per clock, higher stock and boost frequency, more Cache (20 MB vs 8MB). It'll just rek the the older CPU easily in everyway.
+Antropovich A long time ago you did have motherboards that has DIP sockets for cache chips, but there days are long gone! I remember a 286 motherboard with that option. I don't know what level of cache it actually was. Probably the last level before checking main memory.
+desolator XT higher level caches are larger but slower. if the CPU misses lower level cache it uses higher level cache and if all caches are missed it uses main memory
+33C0C3 The higher level caches are also further away, so by the laws of physics they take longer to access. This is also one of the reasons why having caches are super important on modern computers, the processors are fast enough to make the distance between processor and memory count. There's also typically a separate L1 cache for each core of the CPU. I believe the L2 and L3 caches are always shared between all cores.
Cache works faster so is better for gaming, we need more cache in CPU's to play games faster, why not make the motherboard with integrated cache in the side of the CPU to add additional cache to the processor.
we do not need Cach it is all depending on how the computer works. I need that one of the collages I will not name as I may get in trouble for it . Well that collage is working on a computer that will use 1 chip for memory this chip is your hard drive you ram and your system and cpu cache what this is is a single chip system for memory. Its realy cool how it works yes you would think it will slow things and yes it can but what is cool about it is down the line say a faster chip comes out for the memory but you cpu is fast enuff to run stuff well you can replace that chip with a faster bigger chip save mony doing it and get faster speeds.
+Roflcopter4b Registers are where the CPU stores its working variables. It is the fastest memory to be accessed, but typically only contains a handful of bytes per register, and not many registers. A cache is much larger, up to some megabytes, and stores frequently accessed memory. It is much slower than registers, but much faster than RAM.
+Roflcopter4b registers hold the actual state of the CPU at a given time, to analogise this to a human being, a register could be the position of your arm or what smell your nose just picked up for example, whereas the cache is like a human beings sensory/short term memory holding info about remembering where your arm just was a second ago and what that smell was just a second ago, then ram is like medium term memory, what you expect to be doing in the not too distant future and then the hard drive being long term memory with all the info you have in total.
+Roflcopter4b Cache is usually not addressed to explicitly, it's not something the programmer can manage (though there are exceptions, such as the PS3), though one could think of registers as explicitly addressed "L0"-cache
Why RAM access is so slow? Because it has to take a bus.
+QVear I love this joke.
+Zark Bit That's because I've made that up.
+QVear I don't get it.
+QVear at least now its all interconnected on a quick path instead of having to go over the bridge in the north.. such a nice shortcut
+CatnamedMittens Michael Bialas The memory bus is what connects the ram to the cpu,also the bus in general is what connects the different parts in your computer,it's those lines on the motherboard
The cache is the kitchen cupboard, main memory is the corner shop and disk storage is buying on-line at a supermarket. It's a lot quicker to get the cornflakes out of the cupboard than walk to the corner shop for the same item, let along wait a couple of days for them to be delivered.
And retrieving something over the internet is like sending a rocket to mars to return some rocks
2:42 all my years-long confusion about cache wiped out with the 20 second animation. It's fantastic Computerphile!
4:58 "You only need a relatively small amount of cash to make a significant difference". #gangsta #hollahollagetdolla
Good use of animations to help visualize the explanations. Kudos to the animator(s).
Modern CPUs often have three caches, the one that is described by this video (separate for data and instructions) is the L1 cache, the fastest, and every core has it's own. L2 is larger often shared between cores (2 on my system) . L3 is the biggest and there is only one for the actual chip on which the cores are.
ty
And we must structure our data access in such a way to use the caches as much as possible and delay the need to eventually go out to main memory, That’s the “fun” part
Just a bit of clarification, nowadays RAM chips can be clocked faster than the CPU, but the latency remains quite high (relatively), so cach memory is here to stay indeed.
cache hits and misses, cache consistency, cache locality, write-through vs. writeback, cache segmentation (as mentioned, instruction and data), multiple cache levels...cache is a moderatly complex and very much researched topic. This however is a great introduction to the basics.
cache rules everything around me
+Jam Goodman kek
C.R.E.A.M
+AcornFox I'll cream u ;)
get the...
@@AcornFox get da money, dolla dolla bill yoo
That cpu at 3:09 is rocking a bernie sanders look.
+Neeboopsh I would have gone with Einstein personally.
Laughed so hard at that...
LOL. I love you.
Glad there is a computer channel like this.
Thank you so much for not saying uuum, uuum, uuum every sentence. Remember you are the result of tens of thousands of years of recent human evolution. Be proud of this.
Always so excited to open my browser on a new computerphile video! Thanks for the quality videos.
Best explanation about what cache is I've ever listened to.
Really great explanation. Simple and informative!
Though, it is kinda sad, that it is impossible to disable cache, just to feel, how important it is.
What a coincidence. I just had a lecture on computer memory today.
+SternMann93 Do you still have it in cache or commited to memory?
Can I just say now how much I appreciate you putting the DuckDuckGo logo into this video... Well done. Best search engine there is by miles.
+Barry Smith Why? The name sounds kinda... You know, virus-y, that's why i unninstalled it at first glance.
+Unknow0059 # They don't sell their users' information, I think. That's one of the big things.
+Unknow0059 # uninstalled? It's a search engine.
+Barry Smith I dig their privacy policy. But honestly, at least in my experience Google still provides the best search results by miles. And personally, I don't care about Google making money on me, hey, it's a business after all, nothing more.
Jim Groth
Yes.
This is the best channel in the world.
+Alex Azazel Agreed.
+Alex Azazel BETTER than numberphile ;)
I agree with your sentiment, but I have to say the best channel on UA-cam is definitely Lasagna Cat.
Thanks Fellas it took me so so long to find this chan.
One of the best channels on UA-cam.
Wow. What a clear and succinct information.
Great explanation!
You then have different levels of cache. Each level is slower, but it has more space.
Modern desktop CPUs have three levels of cache before going to RAM. Some server CPUs have a level 4 cache.
Vote for turnip!
+Screpheep I read this as "Vote for trump", ironically they're both strangely similar.
+Owen Prescott That's not ironic, that's coincidental.
> *_Coincidentally_* they're both strangely similar.
+Rock It could be considered ironic, if you believe that Trump has the mental capacity and grasp of world politics of a turnip.
+Debated Nothing And it's really great turnip by the way, it's some tremendous turnip.
I wanted to comment that!
You could have elaborated on what's going on with L1-L3 Caches. Is one for data the other for instructions, or do they have no preferences? If they can/will store the same, how does the CPU decide what to put where as they have different speeds?
I think this should be even in a beginner video, because you will be confronted with those values in most ads.
I am confused. Please describe this as an optimisation problem: what are your decision variables, what is the objective function and what algorithm are you going to use to solve it all?
I like Dr Bagley's explanations. He is great. Thanks for the interesting vid! love learning the innerworkings of computers!
Great explanation and analogy!
Thank you! cleared a lot of doubts.
2:24 That sounds quite ambiguous in 2018...
Is there any major difference between the different types of Caches? I have heard of terms L1, L2, and L3, but I'm not sure what the differences between them are significant.
+Karan Naik They're different speed caches. L1 is the fastest, but the most expensive to make, so you tend to have less of it, while L3 is cheaper but slower (but obviously still faster than RAM)
+Karan Naik in a class at the university we compared the speeds and costs of the different storage options (hdd, RAM, L3,L2,L1 etc.) and the differences were just huge. Going from one level to another often meant a speed (and cost) increase of a factor of a hundred or even a thousand. Another very important metric on the low level stuff is the access time; on the L1/L2/L3 Cache on the CPU it is dramatically shorter than on the RAM or your harddisk
szErnzEit
Yep, on a haswell CPU, in the time it takes the CPU to fetch from L1, light travels 39cm
+Karan Naik In general, L2 is slower than L1 but has more capacity, L3 is slower and has more capacity than L2. This means that using L1 cache is prefered but a when a page in L1 is invalidated, it's invalidated to L2, L2 to L3, L3 to RAM
Why do computers need caches? To buy their grocheries.
+ExaltedDuck They store their groceries there for late night stews
A very good way to explain what cache is, becouse I didnt now what it was before I watched this
His shirts are mesmerizing. Where does he get them from?
Marks and Spencer mainly…
The is actually a additional problem apart from cost. Silicon chips can only be made in a certain size, about 300-400mm^2 is tops. A normal computer needs a typically say 4-16GB of ram today, and it would consume over 1000mm^2 of space. This make it so you have to make a multi chip solution.
Then of chip memory have to use board wires, they are a lot slower than silicone wires, make it so you need more cache.
Actually, the premise that its that DRAM is the reason why cache is needed. But the wires is the issue that make it so we need a lot of cache
Can we have a video about single and double precision computing?
I can't wait until memristor technology matures and we don't have to worry about multi-level caching and all of this complication. Having just one big matrix of memristors will enable us to have a flat memory space, all nonvolatile, and eventually the memristors will be able to be dynamically changed from providing data storage to actually performing computation, so even the CPU will just be part of the memristor matrix.
No mention of the register :0
+Scias The registers change context too often for it to be useful for cacheing.
+Ali Can Metan
Eeeh. They have similarities in the way they could be implemented in hardware, but their intent is too different to be considered the same thing.
Caches can be addressed, registers can't. There is no instruction saying "Register A contains an address to another register. Copy the value in the register whose address is stored in A to register B." I mean... technically it would be possible if the manufacturer put in instructions for that, but you only have a few registers anyways so that would have hardly any use at all. And registers are not used in this kind of fashion. They are used to hold temporary values before you do something with them or write them back into the main memory.
Also I'm assuming that with "registers" you mean for example EAX, EBX, ESP, ... on a x86 processor. Because the hardware used to store a bit can be called "register" too, even it if is not such a register in the processor.
As far as I know EAX, EBX and so on are called "general purpose registers" and they are typically what you mean when you refer to the registers of a processor.
+Ali Can Metan
"I have always thought I was utilizing CPU cache when I used them." That's not necessarily wrong, but most likely you don't. An example: int Cube(int a) { return a * a * a; } In this case the argument a is most likely stored in a register (instead of the stack when there are a lot of arguments). Say that's EBX. The return value shall be stored in ECX. You could do something like this: Mult EAX EBX EBX; Mult ECX EAX EBX; You only access registers. The RAM and Cache are not involved. But sometimes you want to access stuff allocated on the heap: int Foo(int* a, int b) { return a[b]; } Assume a is stored in EBX, b is stored in ECX and the return value shall be stored in EDX. This could be done like this: Mult EAX ECX 4 /*sizeof int*/; Mov EDX [EBX + EAX]; Here you have to access the RAM at a + b * 4. This could be cached to save time next time you access that region of RAM. (I'll assume that whatever is passed as argument a is allocated by using some equivalent of malloc. Because if it was allocated on the stack, caching would not make a lot of sense because the stack changes too much.) The registers in both cases are just used to hold temporary values. The contents of registers are never cached unless you explicitly write them somewhere into the RAM. They don't need to be cached because they are faster than caches anyways.
Scias
That's why we have high-level languages :P
Register? I think he did mention it indirectly. That was the turnip in the back pack.
Cache... turnip in the fridge.
DRAM... turnip in the field.
the point of modern CPU speeds vs RAM speeds forcing the use of more space on the silicon begs the question... what would a 3Ghz, ram stick look like? how much space could a normal sized(lets say DDR2) stick hold? how big would a 8GB stick need to be?
You can always tell when someone truly understands what they are talking about by the explanation, and this guy has that in droves ,good job!
Awesome Channel..
Can't find tutorial about cache like this..
Thank you..
IF it helps think of it like this , if you are doing DIY in your house you don't go and get 1 tool at a time from the shed, you get the whole tool box , you may not need every tool but its much faster than going back and forth.
+ComputerPhile आपकी एनिमेशन बहुत अच्छी होती हैं । आप किस तंत्रांश कि सहायता से इन्हें बनाते हैं ?
--------------------
+ComputerPhile Your animation is pretty good. What software do you use to make it ?
+आकाश कुमार शर्मा (पंचतत्वम्) Many thanks - Animations done with Adobe After Effects >Sean
*****
धन्यवाद ।
-------------------
***** Thanks.
+आकाश कुमार शर्मा (पंचतत्वम्) What is with the dual languages?
+Ayush Dhar Not all people use english keyboards?
+Ayush Dhar Not all people use english keyboards?
I would love if you did a video on if computers are inevitable, let's say all computing technology and everything that stemmed from them was erased, would humans Develop a computer again. What's happened in history for people to first have the idea of a computer and farther and farther back, like what lead to what lead to what, all collaborating technologies , infrastructure, and sciences that spawned the computer. I know it may take a really long episode or a few episodes but it would be really cool. What was square one? What was just before that, that lead to square one. Hope this makes sense.
Gotta love how cache did make a difference. Going back to the Pentium 4 and the Celeron. In many ways they are the same except in cache size. Pretty much not not so succesful P4's became celerons.
And for office tasks, light office use, it didn't matter that much. Unless you where dealing with huge sheets to work on.
So many office pc's ended up with celerons. And that wasn't bad!
It was slower then the P4, definitely once certain jobs really could use a bit more cache then the celeron offered.
Of course nowadays cache also is used to connect multiple cores together. As a bridge between them. So you get a nice big swat of memory to function as shared cache for all cores. And locally a much smaller one for its local, one core only, calculations.
So does cache influence speed? Yes but it also depends on the job it is doing. Word probably won't get much a speed boost with double the amount of cache. But encoding video, running databases or yes a huge spreadsheet then a big cache will mean more performance.
The memory used to make CPU cache must be very robust, I know that RAM can become worn out over time. What makes CPU cache memory last so much longer than other memory types?
+MrGridStrom I would guess that it has a lot to do with the fact that it's such a small amount of memory compared to DRAM, the CPU generates a lot of heat but modern heat syncs can disperse a lot of it, whereas a lot of ram relies on very small heat sinks so the heat over time may do more to damage the memory in the DRAM sets than in the CPU's cache
It is because CPUs use SRAM(Static Random Access Memory) instead of DRAM(Dynamic Random Access Memory) SRAM is faster than DRAM and also doesn't require refreshing allowing for higher speeds. But it is more transistors than DRAM which is why it isn't used in ram. It also uses more power.
I think it would be great if you made a video about FPGAs, I wonder what you think about programmable logic.
Just want to point out that the word cache is from the french verb cacher (to hide).
Simplify chips by getting rid of all general cache levels but level 1 and fixed purpose level 0 cache. Reduce instruction size to 16 bit, remove branch prediction, legacy instructions, combine GPU and CPU SIMD as an APU, with many simple cores and increase level 1 cache to fill up the rest of the chip, with some redundancy to improve yields. This architecture makes more sense as we get closer to 1nM chip fabrication. Imagine an APU with 1 GB Level 1 cache the size of a Xeon.
nvidia / amd / intel would lose money that way that’s why we don’t see as many apus imo
@@CeezGeez .. Intel has everything to gain. I'm one of these people who don't care about Intel's Arc discrete PCIe graphics doing well enough to be competition for AMD and nvidia because I'd rather Intel just pull the rug from under their feet with Intel APUs and an Intel-only mobo standard that has a standardised GPU die socket, like a CPU socket, and possibly a very high bandwidth VRAM socket to match. Faster CPU-GPU i/o, faster data access, with the VRAM usable by the CPU too, though at a slower speed than the GPU. PCIe wastes a lot of space and power as well as potential bandwidth.
Can compilers explicitly reference the cache in the same way as RAM or registers?
+ElagabalusRex No, the cache is transparent. You can flush or disable the cache but not control its content.
+ElagabalusRex As Jan said No. However you can try and optimise compilers so they don't break the cache and stall the pipeline. That's a whole other discussion.
+ElagabalusRex Generally no, most cache is transparent, but for example the PS3 has cache that can be directly refered to.
apart from the cost problem, why don't we just use ONLY cache and more of it, either as part of the CPU, or externally.
instead of using regular ram. cache would get cheaper anyhow it it were used instead of regular DDR or ECC memory.
it may even save motherboard space compared to regular RAM but that's probably not a really severe problem.
I would think the cache also helps to keep the flow of information constant without pauses "lag" etc.
+Neil Roy Depends. When you start to watch a video stream, some of it is put in a cache in RAM (and at various points of your connection) before you see anything, to make it more smooth, which increases the delay but makes the experience smoother (since you don't always have to wait for the next video frame to arrive from across the globe). Also, for a CPU or program to look in RAM (depending on if it's software cache or hardware cache) takes some time and might take more time than bypassing cache, for example if you have a high-speed internet connection, it might be faster to request the webpage from the server, rather than have Chrome check a huge cache.
sundhaug92
I recently read an excellent article on optimizing your data in your programs (I'm a programmer) and it just happened to cover CPU caches. They are quite remarkable. I learned the L1, L2 and L3 caches are labeled as such depending on how far away they are from the processor. L1 being the closest and having the fastest times to fetch data, L2 being father away, and L3 the farthest. The amount of time to fetch data from the L3 is quite considerably slower, I was surprised. I guess what happens is when the CPU reads from memory it will store that location in L1 cache. If L1 becomes full, than it will take the memory location stored in it that has been idle the longest (not accessed for the longest time) and move it to L2. The same happens if L2 is full, the oldest gets moved to L3 and eventually the oldest is removed from L3 totally. If no memory is found in any of those caches, it then fetches from RAM and the process of storing the location starts all over again. There is much more too it (memory fragmentation etc... which is what one can avoid by structuring your program properly and speed things up).
I recall years ago, in the days of Netscape that website caches basically check the website for the page requested and check the date of that page and compare it to the page (if any) you have stored in cache. If the site's page is newer, it gets the webpage from the website, otherwise it will load in the one from the cache.
That is my understanding of this. I imagine something similar is done for images on the webpage as well.
With video I think it just loads as much of the video into a buffer as possible then starts playing, usually on UA-cam you can see how much is being loaded ahead of time. Seeing as how videos are sequential in nature, this is probably the simplest form of caching/buffering. The CPU cache being the most involved.
So just a bit of curiosity mixed with some at the moment "current events," with the advent of faster DDR4 memory being capable of speeds of around 3000 MHz (3 GHz) and processor speeds largely staying the same over the last approximate 7-10 years is it possible that cache may become less important of a component, or is it that when you combine the both multi-threaded applications both in multi-core and hyper-threaded applications that the explanation for cache becomes more complex than simply the clock speed of a processor and the speed of the RAM being a major difference between each other as it was in the past?
Joshua Cooper well it would need to do one get data and send it back cycle per instruction. The instructions per clock cycle are nowadays often higher then 1 and all cores would need to do that in serie because ram is mostly not parallel. We would probably need 12 ghz ram before we don't need cache, but then CPU will have a similar speed
He explains stuff in these videos better than he does in my lectures with him! fml ahaha
Wow... CPUs are far, far less stupid than I thought. I was under the impression that I knew roughly how they worked, but clearly they're far more complex than the theoretical thing I was thinking of...
This is a great explanation of caches but this literally scratches the surface of what a CPU does. But I'm glad you found a new respect for them they are amazing pieces of technology.
Look up technical press releases of Intel chips, there's a lot going on in there nowadays in terms of cache, circuitry, division of labor between/among cores etc.
Umbert Lapagoss I might do, I find this stuff interesting. I'm a C++ developer anyway, so knowing this stuff will help me write faster and more memory-efficient code.
Why a CPU needs cache? just turn the cache off in the bios and see what it's doing. (this may take some time...)
Hopefully in the future the cache gets to be our main memory, combined with ultra fast SSD's shit will get real.
I understand why we need cache but why do we need a centralized cache? Wouldn't if be faster to attach smaller caches to individual cores for doing tasks in parallel?
This would make multi-threaded applications more of a nightmare to design. If Core 1 has some data Core 2 needs for an unrelated routine, it would have to stop what Core 1 is doing to request that item in the cache, or it would have to go out to RAM to get a piece of information the CPU technically already has.
So really, the answer isn't that memory didn't keep up, that technically we can't do it, but simply that the cost of RAM as fast as the CPU combined with the amount of memory modern tasks use left room for a cheaper solution in the form of a small amount of very fast memory, a moderate amount of medium-speed memory (main RAM), and slower mass-storage (HDD, SSD). We could have a machine with 16 gigs of super-fast RAM but the performance increase would be moderate compared to the extreme cost.
More abstractly, we've always had cache in the form of a multi-level memory scheme, and this is just further stratification of it. Even the original computers had the online storage of bits it was working on, and humans with the rest of the data, with a slow link between them. The 1980s computers shown had a few levels: CPU with its registers, main memory, and cartridge/tape storage. There's a tradeoff between size of storage and cost of fast access, and many tasks naturally use a small portion at a time, so you use it as cache for the larger one.
Looked at from the other direction, it wasn't cache that we added, as the fastest memory kept up with CPUs, and its size has stayed fairly similar (even modern CPUs have the L1 cache around 32K, like home computers had of fast main memory); rather, we've *added* a huge later bank of RAM to modern computers, between the fast memory the CPU uses and the slower mass-storage.
I had a 73 gig cache made by google chrome The way I found out was windows media player said I was out of memory a 120 gig ssd can fill up quickly I had to enable hidden folders to finally find the file why did chrome think a 73 gig cache was necessary
I think you started to explain at the end... what is the L1 and L2 cache? I remember theis was a structure the put on in the late 90s. This is the CPU cache you speak of, and the RAM cache is the instruction memory? Chrome cache just made me type this twice when the vid changed, and is this a failure of the route-a-begga?
+Matthew Gore L1 and L2 are cache levels, L1 being closest to the core and is faster but has less storage than L2. For each layer, there's both instruction and data cache. If you have a CPU with say 3 levels of cache (quite usual) then you can think of RAM as cache level 4 and your harddisk (through the page file) as cache level 5 (though if you have an SSHD, that being a hybrid SSD/HDD, that's level 5 for the SSD and 6 for the HDD). Double-posting isn't necessarily an issue with routing or caches but just one of the hickups that sometimes happens.
sundhaug92 wow, thanks for the reply - quite descriptive, and very appreciated. IMO, hardware architecture is SYSK, I mean we use it every day, and should know the tech diffs between steady state and optical
Mmm, time for some turnip stew.
+The Hoax Hotel Hey, didn't expect you here.
How does cache work when considering a multi processor architecture? Is the cache shared between cores?
What happens when another core tries to access a memory location in RAM that was cached and changed by a different core?
Each core has its own cache and they have internal methods of synchronizing cache contents between them.
That is like if I asked how gravity works, you answer that there is a certain relation between mass, distance and the gravitational force... Not that informative. The gravity case I'd like to learn the function Fg=G*m1*m2/r^2. In the case of the cache: how and when does it synchronize the cache? How can I be sure that the data is synced when it needs to be?
I didn't know it was called cache coherence, so yes it was. Still, thanks for the link
+Rik Schaaf Some cache-levels are shared and programs can explicitly request that data is flushed to shared memory (L3 or RAM in this case) in addition to being able to signal to other cores "Hey, I just changed that piece of memory that we both might access"
I like that you've put duckduckgo in the video, personally I prefer that search engine.
Bagley how'd you get an Archimedes with two floppy drives?
Does memory cache function in a computer, similar to the way a capacitor functions in an electronic circuit?
DUCKDUCKGO! best search engine ever :)
I am wondering why copying large files takes time, but deleting it is way too fast?
When you delete, you don't delete the files itself, only the reference to retrieving these files.
Now that he has done a few videos, he is quite comfortable with the camera and talking to it. Congratulations, it's not something so easy to do.
Now he just has to either look at you or look at the camera, try not to look around so much, and they will be perfect! ^^
Shouldn't have to say that the video was very interesting too. :)
Why can't I pay extra and get a computer with 8gb cache and how much faster would by computer be if I could?
+THERESAPARTYINMYHEAD Possibly a fair bit faster for some non-optimized edge-case scenarios, but your computer would be HUGE and probably cost millions. There's only a couple of megabytes of cache memory in a CPU, and the CPU already uses it well. However, cache memory doesn't work the same way as RAM so it might not even be possible, and because of that it also takes up much more physical space.
+THERESAPARTYINMYHEAD You actually get diminishing returns as the size of your cache increases. Think of it as searching for a book in a cart of books you recently read. If you have a huge cart, sure you can keep a lot of recently read books nearby but it will take you longer to find the specific book you need.
Is the gain coming from having to work with small addresses, or something else too? I would imagine there's never more than 24 bits needed to address cache content (not to mention special CPU instructions made to work with even smaller address sizes in exchange for greater speed)
It did seem that my computer got significantly faster after ..well, doubling the frequency, and turning from 1mb of regular L2 cache into 3mb of "intel smart cache" What's the differrence there? Simply that the regular L2 would be split in twain for both of the cores, and that the newer one could use the whole 3 megabytes in both cores at the same time?
+Zandonus Yip. That's all it appears to be. It allows you to not only share data between the two cores, but also have one core have any of the cache that the other core isn't using at the moment. It's like hyperthreading for cache.
Dr Steve "Heartbleed" Bagley looks like a character out of Lord of the Rings.
Gollum? ;-)
DrSteveBagley Elf , Hobbit mix.
Thanks! :)
I wonder, since RAM has actually caught up with the CPU again where a current generation intel cpu can accept RAM with clock speeds of up to 3GHz without going into overclocking, pretty much the same clock speed as the cpu itself, why is the cache still faster even if the cpu is running at that same 3GHz?
+megaspeed2v2 For one, latency
+megaspeed2v2 A huge part of the reason that the cache is faster is that it's physically closer to the CPU. It takes time to get your request all the way out to the normal memory and all the way back to the CPU.
I think that is what +sundhaug92 is referring to when he mentions latency.
ZipplyZane yes it was, i understand the latency aspect as it helps when accessing lots of little pieces of data but whenever larger amounts of data are being accessed wouldn't for example 3GHz ram coupled with a 3GHz CPU no longer have a bottleneck as the ram can throw out data quickly enough to use up the entire CPU time in short bursts rather than the cpu having to wait for a few million clock cycles between transfers
I LIKE HIS SHIRT
Where can I find more videos about the Turnip Machine?
Is Steve the author of Heartbleed?
Second level turnip cache... For some reason I find that very funny
Why not combine rm and CPU?
Would you talk about predictive branching in cache pipelines.
+Donald Kjenstad (DonK) Actually, instruction pipelines aren't a part of cache, though pipelines are used to help use cache more efficiently
I have i7 920 at 2.67GHz, if I upgrade to i7 5960X, and clock it to 2.67GHz, will there be a signnificant difference? And how much does it come from the cache?
For starters, the I7 920 is a quad core 8 Thread CPU where as the I7 5960X is a 8 Core 16 Thread BEAST. Also, the 5960X will have higher instructions per clock, higher stock and boost frequency, more Cache (20 MB vs 8MB). It'll just rek the the older CPU easily in everyway.
What is +Keyori doing here?
When your CPU has 15MB of L3 Cache
Congratulations. You spent money on a CPU. Achievement unlocked.
Keep your friends close; keep your turnips closer.
can you add caches to a PC? is there option for regular users?
No, cache physically is built into the chip.
+Antropovich A long time ago you did have motherboards that has DIP sockets for cache chips, but there days are long gone! I remember a 286 motherboard with that option. I don't know what level of cache it actually was.
Probably the last level before checking main memory.
+Antropovich No, but the CPU that you buy or that comes with your computer already has caches built-in.
yeah, thought so. I wanted to know if there was a mdel that could somehow change caches.
Cool, ty :D
What about Level 1,2 or 3 caches? What's their difference?
+desolator XT higher level caches are larger but slower. if the CPU misses lower level cache it uses higher level cache and if all caches are missed it uses main memory
+desolator XT I think that comes down to complexity and purpose.
+33C0C3 The higher level caches are also further away, so by the laws of physics they take longer to access. This is also one of the reasons why having caches are super important on modern computers, the processors are fast enough to make the distance between processor and memory count.
There's also typically a separate L1 cache for each core of the CPU. I believe the L2 and L3 caches are always shared between all cores.
do arm processors have cache?
Cache works faster so is better for gaming, we need more cache in CPU's to play games faster, why not make the motherboard with integrated cache in the side of the CPU to add additional cache to the processor.
No mention of external caches? I know Apple used it for their early PowerPC computers.
we do not need Cach it is all depending on how the computer works. I need that one of the collages I will not name as I may get in trouble for it . Well that collage is working on a computer that will use 1 chip for memory this chip is your hard drive you ram and your system and cpu cache what this is is a single chip system for memory. Its realy cool how it works yes you would think it will slow things and yes it can but what is cool about it is down the line say a faster chip comes out for the memory but you cpu is fast enuff to run stuff well you can replace that chip with a faster bigger chip save mony doing it and get faster speeds.
What's the difference between cache and a register?
+Roflcopter4b Registers are where the CPU stores its working variables. It is the fastest memory to be accessed, but typically only contains a handful of bytes per register, and not many registers. A cache is much larger, up to some megabytes, and stores frequently accessed memory. It is much slower than registers, but much faster than RAM.
+Roflcopter4b registers hold the actual state of the CPU at a given time, to analogise this to a human being, a register could be the position of your arm or what smell your nose just picked up for example, whereas the cache is like a human beings sensory/short term memory holding info about remembering where your arm just was a second ago and what that smell was just a second ago, then ram is like medium term memory, what you expect to be doing in the not too distant future and then the hard drive being long term memory with all the info you have in total.
+Roflcopter4b Cache is usually not addressed to explicitly, it's not something the programmer can manage (though there are exceptions, such as the PS3), though one could think of registers as explicitly addressed "L0"-cache
16 gig dimms i coulda sworn ive seen a 128 gb dimm for servers somewhere
I'm gonna need about 64gb of level 1 cache please.
I had to drive to the address to get the cache, but it wouldn't open so i used the ram.
Who uses turnips in a computer analogy? Has Dr. Bagley been playing Super Mario Bros. 2?
+Jeff Irwin Maybe he's a fan of LTT
in Australia they pronounce it keish
it's Philip Seymour hoffman!
Why does Blackadder pop into my mind after watching this?
He just looks incredibly smart