Writing cache friendly C++ - Jonathan Müller - Meeting C++ 2018

Поділитися
Вставка

КОМЕНТАРІ • 39

  • @sanderbos4243
    @sanderbos4243 2 роки тому +3

    7:43 I'm pretty sure that since each box represents 4 bytes/32 bits/an int, the example with a stride of 16 bytes (4 boxes or 4 ints) here should have its green boxes at 0, 4, 8 instead of 0, 3, 6, which results in a waste of 75%, instead of 71%.

  • @sanderbos4243
    @sanderbos4243 2 роки тому

    Amazing talk, thanks Jonathan! :-)

  • @kenilmehta4247
    @kenilmehta4247 5 років тому +1

    20:18 How is sizeof(Normal) equal to 8 bytes?

    • @RyanCahoon
      @RyanCahoon 5 років тому +5

      He said earlier (18:10) that enums are 4 bytes on his machines, then 1 byte each for the bool and uint8, then 2 bytes of padding

  • @mapron1
    @mapron1 5 років тому +1

    19:10 - You say uint32_t* always must be 4-bytes aligned on x86, but is that exactly? I could have unaligned pointer to int32, don't I? Yes, it perf degradation, but it is possible?

    • @foonathan
      @foonathan 5 років тому +4

      Yes, technically you can have an unaligned pointer. Then that trick doesn't work anymore, correct.

    • @mapron1
      @mapron1 5 років тому

      Thanks for answer. And for awesome speech, too :)

    • @Carewolf
      @Carewolf 4 роки тому

      I think you can in x86 assembler, but I don't think it is valid C++. Or rather it works, but is undefined behavior and a sanitizer will complain loudly.

    • @llothar68
      @llothar68 4 роки тому

      @@Carewolf No complain if you use pragma. Normally no complain at all, because you use low level cats to get unaligned access. I love low level bit fuckery.

    • @Carewolf
      @Carewolf 4 роки тому

      @@llothar68 You can also use a memcpy, a good compiler would optimize the copy away but leave valid code. In the case of x86 using unaligned memory reads

  • @Sebastian-lz5ue
    @Sebastian-lz5ue 5 років тому +8

    6:17: "..the fast caches are also the slowest."

  • @Rhumage
    @Rhumage 5 років тому +3

    19:00 I still don't understand where 62 comes from

    • @phonlolol5153
      @phonlolol5153 5 років тому +26

      he assumes that the std::uint32_t type has the proper 4 byte alignment. this means, that the pointer, which points to a std::unit32_t, can only point to addresses like byte0, byte4, byte8, byte12,byte16 and so on. so the last 2 bits are essential always zero.

  • @konrad3688
    @konrad3688 5 років тому

    Should i prefer using "sorted_set + vector" rather than std::map / unordered_map? Are there any benchmarks for this?

    • @foonathan
      @foonathan 5 років тому +6

      Here are benchmarks against boost::flat_map, which is similar: stackoverflow.com/a/25027750
      You should prefer it to std::map, but obviously an O(1) hash table is better than an O(log n) search. std::unordered_map is still not ideal, however. There are better hash table implementations out there, see foonathan.net/meetingcpp2018.html for some links.

  • @sanderbos4243
    @sanderbos4243 2 роки тому

    17:03 Shouldn't sizeof(Bad) == 24 have been the right answer, instead of what's in the presentation?: The largest type is uint64_t or 8 bytes, so field a == 8 bytes and fields c, d and e packed together fit in another 8 bytes, so 8 + 8 + 8 == 24?

    • @LewiLewi52
      @LewiLewi52 Рік тому +1

      The processor can only read N bytes on an address evenly divisble by N. Consider a struct with an int64 followed by an int32, the starting address of the int32 is divisble by 4 but following the int64 by int8 and then int32 would place the int32 on an address non divisble by N and thus in need of padding.

    • @sanderbos4243
      @sanderbos4243 Рік тому

      Thank you!!:
      struct { i64; i32 }: 8 / 4 = 2
      struct { i64; i8; i32 }: 9 / 4 != integer, so pads the i8

  • @Thiago1337
    @Thiago1337 Рік тому +1

    18:09
    I don't understand this part. Why is it 2 bits of information?

    • @neohashi3396
      @neohashi3396 Рік тому +3

      The enum has 4 states: a, b, c and d. In order to count to 4 in binary you need 2 bits: 00 01 10 11

  • @sanderbos4243
    @sanderbos4243 2 роки тому

    19:29 Pretty sure it's 62 bits of information again, so 61 bits + 1 bit. The 61 is because of the bool being padded to take up just as much space as the uint32_t*, which means the first _3_ lower bits of the "a" field will stay 0. This is just like his previous explanation of 62, but you just keep the padded bool in mind. EDIT: I'm definitely wrong, see miguel's reply to me.

    • @miguelveganzones5103
      @miguelveganzones5103 9 місяців тому +1

      what matters here is that it is pointing to a 32 bit type, with 4 byte alignment, thats how you loose 2 bytes of information. That there is a bool within the struct just adds more padding but is otherwise irrelevant for the pointer.

    • @TheJGAdams
      @TheJGAdams 8 місяців тому

      Why is it 62bits though? If you're compiling x64 you should have a full 64bits.
      I can guess old games only supported 2GBs of ram because it's 31bits? But, they also patched it to use a full 32. What's going on here???

    • @sanderbos4243
      @sanderbos4243 8 місяців тому +1

      @@TheJGAdams The context here is that if you have an 8-byte pointer to a 4-byte data type (the std::uint32_t), we're assuming the 4-byte data type to be aligned with a 4-byte boundary. If the pointer only ever points to addresses that are on a 4-byte boundary, the address its two least significant bits are constant and predictable, and so those bits don't carry useful information. So uint64_t'd be 61 information. "...on ARM-based systems you cannot address a 32-bit word that is not aligned to a 4-byte boundary. Doing so will result in an access violation exception. On x86 you can access such non-aligned data, though the performance suffers a little since two words have to be fetched from memory instead of just one."

    • @TheJGAdams
      @TheJGAdams 8 місяців тому

      ​@@sanderbos4243
      I'm more confused now.
      I don't know what you're talking about, but I was asking about pointer themselves. They store address not the data it's pointing to. So, 8-byte pointer to a 4-byte data type? Not my question.
      The question is, why is pointer 62 bits?
      Why is the 2 bits constant and predictable?
      Old game used to only support 2GB and they can patch it to 4GB. E.g. 31bits is 2 billion.
      Also, can you explain alignment? I don't understand why it would take 2 words. You don't access memory by words you access by cache line.
      Also, word is CPU specific. it can be 32 or 64bits nowaday.

    • @sanderbos4243
      @sanderbos4243 8 місяців тому

      @@TheJGAdams The 8-byte pointer indeed has 64 bits worth of possible states, that's completely correct. But what the video's "information" metric represents is related to the field of information theory (you'll find better explanations of that if you look it up). The point being that if you have an array of 4-byte ints, you *know* that any address that points to one of those ints will be 4-byte aligned. Simply put, if you print the addresses of the ints they'll go +0x0, +0x4, +0x8, etc. So the "information" metric from this video takes the number of bits an 8-byte pointer can address (64), and subtracts 2 from it simply because those last 2 (least significant) bits will always be 0. So "information" says "Yeah yeah, of course those last two bits are zero for this address! You didn't need to tell me that, I can see that from the size of the thing I'm pointing at being 4 bytes (2 bits)! I only care about its offset!"

  • @antonios-m4291
    @antonios-m4291 2 роки тому +5

    One of the more unclear cpp presentations; I must say.

    • @tal500
      @tal500 3 місяці тому +1

      This one assumes a big background in memory performance