Set Associative Caches 1: What is a Set Associative Cache?

Поділитися
Вставка
  • Опубліковано 18 тра 2020
  • Support What's a Creel? on Patreon: / whatsacreel
    Office merch store: whats-a-creel-3.creator-sprin...
    FaceBook: / whatsacreel
    This is the first video in a 2 part series discussing Set Associative Caches. These are the types of caches on AMD and Intel CPU's. In this video, we look at how a set associative cache works, and explain the numbers set associativity, cache line size, tag, offset and index. Showing how the CPU addresses RAM, and stores cache lines in a set associative cache.
    Software used to make this vid:
    Blender:
    www.blender.org/
    Audacity:
    www.audacityteam.org/
    OBS:
    obsproject.com/
    Davinci Resolve 16:
    www.blackmagicdesign.com/prod...
    OpenOffice:
    www.openoffice.org/

КОМЕНТАРІ • 82

  • @giladreich810
    @giladreich810 4 роки тому +117

    You really went on another level! Those new simulations stimulated my brain to the point where I stored all this information in my L1 cache. Thanks for the great video once again!

    • @WhatsACreel
      @WhatsACreel  4 роки тому +13

      Hahaha! Cheers for watching :)

    • @StefanReich
      @StefanReich 4 роки тому +6

      What is it with the Reichs and their interest in computing

  • @madokalover
    @madokalover 3 роки тому +36

    You are way better at explaining this than my university teachers were! The graphics are a huge help. Thank you, this is helping tons of people!

  • @mhfarhadi4376
    @mhfarhadi4376 3 роки тому +12

    this video was literally more useful than my entire semester...i'm speechless

  • @stendall
    @stendall 3 роки тому +4

    Awesome vid btw, your CUDA tutorials got me thru a semester. Set associative seems like a cuckoo table without hashing.

  • @RupertBruce
    @RupertBruce 11 місяців тому

    The Disruptor circular buffer makes use of cache characteristics for speed. Thank you for this great explanation of the process!

  • @uncoherentramblings2826
    @uncoherentramblings2826 4 роки тому +37

    Omg. So clearly explained. This is very good teahing material. Good job and thank you very much!

  • @wesleymesquita8380
    @wesleymesquita8380 3 роки тому +4

    You did a really good job containing the animations, didactics and enthusiasm for this subject! Thank you!

  • @willofirony
    @willofirony 4 роки тому +5

    Wow! Gilad Reich wrote all that needs writing about the awesome presentation. Can't wait for part 2. I am hoping you might conclude with an alignment strategy to get the most efficient use of the caches. We are certainly not in Kansas anymore, Toto. Great video.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      Really glad you liked it! I actually recorded both vids in a single take, but split it into 2 because the second half is seemed liked a different video. It's just a commentary on a handful of hardware specs. It would be fun to discuss alignment strategies, and particularly access patterns! Maybe in an upcoming video? Thanks for watching :)

  • @lakshyagoyal5560
    @lakshyagoyal5560 4 роки тому +6

    Great video! I was not expecting to see animations for in this and that was a pleasant surprise! Helped with the explanation a lot too!

  • @TomStorey96
    @TomStorey96 3 роки тому +1

    🤯 such a brilliant explanation. Never knew how caches worked, and not sure when I will ever need to know, but it's fascinating stuff.

  • @mohamed_khoudjatelli9349
    @mohamed_khoudjatelli9349 3 роки тому +1

    I can't describe how much greatful am I
    thank you prof!

  • @MexicanRaptorJesus
    @MexicanRaptorJesus 4 роки тому +2

    Awesome video man! You're going to blow up with such fantastic content!

  • @jiannickW
    @jiannickW 2 роки тому +1

    Really useful and awesome video. Clear and concise, with good examples!

  • @Filaxsan
    @Filaxsan 2 роки тому +1

    Amazingly done! A very clear explanation, thanks Creel! :D

  • @deud1eskrub503
    @deud1eskrub503 4 роки тому +22

    Great stuff, keep it up!

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      Cheers mate! Thanks for watching :)

  • @tythedev9582
    @tythedev9582 4 роки тому +2

    Incredible material. Many thanks.

  • @Vi0lad0r
    @Vi0lad0r Рік тому

    This video is absolutely brilliant.

  • @ZedaZ80
    @ZedaZ80 2 роки тому

    This was so well made and explained!

  • @abeygi5615
    @abeygi5615 3 роки тому

    Awesome graphics and to the point explanation! Thanks!

  • @tamaracousineau2329
    @tamaracousineau2329 2 роки тому

    Very nice visualization. Super helpful!

  • @CodingJesus
    @CodingJesus 3 роки тому

    This was an amazing explanation!

  • @dominiccatherin6661
    @dominiccatherin6661 3 роки тому

    Fantastic video. Thank you Creel!

  • @Psykorr
    @Psykorr 2 роки тому

    Thats a really great explaination!

  • @thehen101
    @thehen101 4 роки тому +5

    This video is great, although UA-cam's low bitrate kind of ruins those nice 3D renders. Perhaps you could render at a higher res? Cheers

  • @thespourieye8590
    @thespourieye8590 2 роки тому

    Amazing video !

  • @damian_smith
    @damian_smith 3 роки тому

    Beautiful - thank you!

  • @skilz8098
    @skilz8098 4 роки тому +1

    Another thing that is similar but different within CPU ISA's is when it comes to their function - virtual - routine tables from accessing data from the disk drive... There is sort of a cache structure there as well, except the information can be hashed into a virtual lookup table.

  • @Morimea
    @Morimea 2 роки тому

    Thank you! Great video!

  • @gideonmaxmerling204
    @gideonmaxmerling204 4 роки тому +3

    In the next vid will you teach about dirty bits and how the CPU is notified of a change made to ram by another component i.e. the GPU or the disk

    • @WhatsACreel
      @WhatsACreel  4 роки тому +1

      Cheers mate! I actually recorded one long video, but decided to split it into two because the second half was different. It's just a chat about some specs from Intel and AMD CPU's. It would be fun to continue with some more info on caches, dirty bits, exclusive v inclusive, victim caches, etc. And the instruction cache, which is a different beast all together! Anywho, thanks for watching :)

  • @sakari_n
    @sakari_n 4 роки тому +7

    also real CPUs will have to synchronization between cores. in situations like core0 has data from address 0x01230123 and core1 stores to address that is in same cache block as 0x01230123. now core0 has invalid/old data it's cache. What happens next depends on the ISA (how relaxed is the memory model and stuff) but, if remember correctly on x86 the invalid/old cache data needs to be reloaded to cache from main memory by core0 when it tryis to access it. also the c/c++ memory model (more relaxed than x86) has some opinions about this and this effects how compilers are allowed to generate code for loads and stores.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +2

      They do indeed! Synchronization between cores is a great topic!

    • @ngissac3411
      @ngissac3411 2 роки тому

      @@WhatsACreel Actually, there is a protocol for multiple cores CPU, which is the MESI protocol. Basically, Intel and AMD have their unique protocol based on the MESI.

  • @johnyoungquist6540
    @johnyoungquist6540 4 роки тому +3

    great explanation!

  • @WiseWeeabo
    @WiseWeeabo 4 роки тому

    love the skeletor thing

  • @poojasinha1943
    @poojasinha1943 3 роки тому

    It helped a lot. Thank you.

  • @Alex-op2kc
    @Alex-op2kc 3 роки тому

    HOly visualizations, Batman! This is great!

  • @jake_3745
    @jake_3745 2 роки тому +1

    brilliant

  • @user-ym4yt9bo2u
    @user-ym4yt9bo2u 3 роки тому

    THANK U now i actually understand this 4 my final

  • @booklibrary2884
    @booklibrary2884 2 роки тому

    Really an amazing and clear explanation, great animations too
    In this example we assume there isn't any virtualization right? All those addresses would be physical addresses

  • @romanemul1
    @romanemul1 4 роки тому +3

    Thanks for this.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +1

      Welcome, cheers for watching :)

  • @PrivateSi
    @PrivateSi 3 роки тому +1

    Better compilers could probably eliminate the hardware automatic caching system and precache code and data in an optimised way. Same for the OS / app runtime dynamic memory manager. Currently it isn't possible to access a cache directly (in X86 at least) but it is possible to precache data. If you could access a cache directly it would save the memory address translation step the hardware has to perform.

  • @ahmadk5844
    @ahmadk5844 2 роки тому

    THANK U !!!

  • @TheYmBProduction
    @TheYmBProduction 2 роки тому

    king

  • @robertfaney4148
    @robertfaney4148 3 роки тому

    wow , so good - have you done anything on virtual memory please?

  • @him21016
    @him21016 4 роки тому +2

    My guy

  • @AmaroqStarwind
    @AmaroqStarwind 3 роки тому

    I'd love to be able to use Ternary Content Addressable Memory (TCAM) for everything.
    I just wish TCAM wasn't so expensive and power hungry, and that the storage densities were actually half-decent.

  • @NeilRoy
    @NeilRoy 4 роки тому +3

    When you have L1, L2 and L3 cache, isn't data from L1 pushed into L2 when new data comes in? And if the data in L2 gets old, it is moved to L3? Something like that anyhow. My memory on this is fuzzy. Anyhow, I seen some great videos on coding your programs to maximize cache hits. The code to do this can often look slower with more code, but the end result will be a huge speed increase. I forget where I seen the video now, but was REALLY fascinating to see normal code, verses code which has been designed to maximize cache hits.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +7

      Yes, the caches generally evict to higher levels. It might be fun to make a video on exclusive vs inclusive and victim caches! All that stuff is great :)
      Techniques called cache tiling/blocking are great! Keep the data being processed in the L1!!
      Cheers for watching mate :)

  • @franzlyonheart4362
    @franzlyonheart4362 Рік тому

    0:56, there. And 3:55 also.

  • @0xggbrnr
    @0xggbrnr 4 роки тому

    Fucking amazing!

  • @regulus8518
    @regulus8518 2 роки тому

    what happens to the cache line that gets evicted from L1 ? does it get written into L2 and what is that process look like ?

  • @NicosLeben
    @NicosLeben 4 роки тому

    How exactly does the comparison with tags work? If a set is full, are all these tags going to be compared in parallel or does it work like a binary search?

    • @chainingsolid
      @chainingsolid Рік тому

      Given how hardware is naturally parallel I would assume parallel.

  • @mrkrisey4841
    @mrkrisey4841 3 роки тому

    I dont understand, in the start animation he has 4 sets and 4 ways. Is one yellow block a cache line or do all the yellow blocks together make up a cache line?

  • @gearstil
    @gearstil 4 роки тому +3

    It is better for you if I let the advertising run all the way to the end?

    • @WhatsACreel
      @WhatsACreel  4 роки тому +2

      Ha! I'm not sure... Nice of you to think of that tho! Thanks for watching :)

  • @cezarcatalin1406
    @cezarcatalin1406 3 роки тому +3

    The one dislike is from intel 😆

    • @WhatsACreel
      @WhatsACreel  3 роки тому +1

      Wow, your icon is animated in the notifications... Is it a gif? How did you do that? Hahaha :)

  • @captainbodyshot2839
    @captainbodyshot2839 4 роки тому +2

    If my program makes a sequential access from beginning to end of some large array, can CPU predict that it will need data from more than just one cache line and start loading the following ones in advance?

    • @skilz8098
      @skilz8098 4 роки тому +1

      That depends on a few other things... It isn't just the hardware and its opcodes, but it also depends on the OS and on your Compiler - Interpreter and how they convert your source code to either assembly, byte codes, or opcodes... There are many optimizations that your Compiler - Interpreter will make depending on your compiler's - interpreter's command-line options and settings... Then it comes down to the architecture and its hardware design for which features are available. After that, it then depends on your Operating System and how it handles the calls to the underlying hardware such as reading and writing to disk, creating threads and semaphores, reading and writing to ports, etc.

    • @captainbodyshot2839
      @captainbodyshot2839 4 роки тому +3

      @@skilz8098 ...Are you sure you know what you're talking about? Never mind, I found out that modern x86 processors do, in fact, have automatic prefetch mechanisms which can detect linear access patterns.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      I thnk they call it smart prefetch at AMD or hardware prefetch at Intel? They certainly do this with the instruction cache too! Compilers will use software prefetch if they're clever enough! Certainly an interesting topic! Cheers for watching :)

    • @skilz8098
      @skilz8098 4 роки тому

      @@captainbodyshot2839 I wasn't trying to be too explicit because you would have to read the datasheets, and the ISA manuals to get all of the details. And the available features and techniques that can be used vary from architecture (cpu), platform(os), and compiler.
      Take, for example, you and I could have the same exact hardware and operating system except I could be using Visual Studio and you could be using GCC or Clang for C++. They all work very similarly and they usually implement 98%+ of the C++ standard, but they may do so in different manners.
      Compiler A might use register X with instruction 1 where Compiler B might use register Y with instruction 2 to generate the same algorithm.

    • @elliott8175
      @elliott8175 3 роки тому +1

      @@skilz8098 pre-fetching happens at the hardware level, not the software level. An executable/assembly can't tell a processor where to put data in the caches. Different compilers might result in different assembly which may result in the processor handling memory differently among the caches. However, processors are either using this technique or they're not, regardless of your assembly code. These days most processors do it.

  • @dannggg
    @dannggg 2 роки тому

    how did the offset read 9?

  • @Alex-op2kc
    @Alex-op2kc 3 роки тому

    Part 2: ua-cam.com/video/tde8lhFdczI/v-deo.html

  • @duydianvu5466
    @duydianvu5466 2 роки тому

    Could you turn on subtitles for this video? Thanks

  • @MrSpikegee
    @MrSpikegee 3 роки тому

    What is your accent? English pirate? Great content btw

  • @sabriath
    @sabriath 4 роки тому +4

    You missed the policy of a "dirty" cache, where data was written to a cache but wasn't synced with RAM when it's evicted....but other than that, pretty much got it.

  • @davidprock904
    @davidprock904 4 роки тому

    My architecture im working on gets rid of the cache principles, your entire storage space would be more like level 0, faster than L1

  • @marceloguzman646
    @marceloguzman646 2 роки тому

    Im curious about the 'Valid Bit'. I was told that there must be one valid bit too, could someone tell me what happened to it? haha

  • @georgewright1093
    @georgewright1093 2 роки тому

    I know this is going to sound silly, but could you work in something about throwing a shrimp onto a barbie

  • @sandraviknander7898
    @sandraviknander7898 3 роки тому

    These are not the cache lines you’re looking for.