C++ Weekly - Ep 233 - std::map vs constexpr map (huge perf difference!)

Поділитися
Вставка
  • Опубліковано 28 лип 2024
  • ☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟
    Upcoming Workshop: C++ Best Practices, NDC TechTown, Sept 9-10, 2024
    ► ndctechtown.com/workshops/c-b...
    Upcoming Workshop: Applied constexpr: The Power of Compile-Time Resources, C++ Under The Sea, October 10, 2024
    ► cppunderthesea.nl/workshops/
    T-SHIRTS AVAILABLE!
    ► The best C++ T-Shirts anywhere! my-store-d16a2f.creator-sprin...
    WANT MORE JASON?
    ► My Training Classes: emptycrate.com/training.html
    ► Follow me on twitter: / lefticus
    SUPPORT THE CHANNEL
    ► Patreon: / lefticus
    ► Github Sponsors: github.com/sponsors/lefticus
    ► Paypal Donation: www.paypal.com/donate/?hosted...
    GET INVOLVED
    ► Video Idea List: github.com/lefticus/cpp_weekl...
    JASON'S BOOKS
    ► C++23 Best Practices
    Leanpub Ebook: leanpub.com/cpp23_best_practi...
    ► C++ Best Practices
    Amazon Paperback: amzn.to/3wpAU3Z
    Leanpub Ebook: leanpub.com/cppbestpractices
    JASON'S PUZZLE BOOKS
    ► Object Lifetime Puzzlers Book 1
    Amazon Paperback: amzn.to/3g6Ervj
    Leanpub Ebook: leanpub.com/objectlifetimepuz...
    ► Object Lifetime Puzzlers Book 2
    Amazon Paperback: amzn.to/3whdUDU
    Leanpub Ebook: leanpub.com/objectlifetimepuz...
    ► Object Lifetime Puzzlers Book 3
    Leanpub Ebook: leanpub.com/objectlifetimepuz...
    ► Copy and Reference Puzzlers Book 1
    Amazon Paperback: amzn.to/3g7ZVb9
    Leanpub Ebook: leanpub.com/copyandreferencep...
    ► Copy and Reference Puzzlers Book 2
    Amazon Paperback: amzn.to/3X1LOIx
    Leanpub Ebook: leanpub.com/copyandreferencep...
    ► Copy and Reference Puzzlers Book 3
    Leanpub Ebook: leanpub.com/copyandreferencep...
    ► OpCode Puzzlers Book 1
    Amazon Paperback: amzn.to/3KCNJg6
    Leanpub Ebook: leanpub.com/opcodepuzzlers_book1
    RECOMMENDED BOOKS
    ► Bjarne Stroustrup's A Tour of C++ (now with C++20/23!): amzn.to/3X4Wypr
    AWESOME PROJECTS
    ► The C++ Starter Project - Gets you started with Best Practices Quickly - github.com/cpp-best-practices...
    ► C++ Best Practices Forkable Coding Standards - github.com/cpp-best-practices...
    O'Reilly VIDEOS
    ► Inheritance and Polymorphism in C++ - www.oreilly.com/library/view/...
    ► Learning C++ Best Practices - www.oreilly.com/library/view/...
    quick-bench.com/q/yGo0VvdRGK5...
    build-bench.com/b/ktOFXQ9__JA...
    godbolt.org/z/cnrzKr
  • Наука та технологія

КОМЕНТАРІ • 103

  • @squelchedotter
    @squelchedotter 4 роки тому +74

    This is an interesting twist on the wisdom that: "Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants."
    Turns out fancy algorithms are also harder for the compiler to optimize.

  • @unevaguejaune8671
    @unevaguejaune8671 4 роки тому +14

    Excellent episode. Please do more like that one.

  • @TRiToN219
    @TRiToN219 3 роки тому +7

    Yeah, constexpr map is a saver, I've been using this pattern myself for maybe a year now.
    I don't know if this helps anybody (cause it's kinda obvious), but a good technique that complements the one Jason is representing here
    is using constexpr 'std::array' of 'string_view' for your 'enum class/struct' values, inside some 'to_string' or 'to_string_view' function.
    You can get a common member name like 'count' or 'max_value' in your codebase at the end of the enumeration and, given the first value starts with zero, just use it in a static_assert to check the array's size at compile time.
    Of course, when you don't need the entire list of enum values, two constexpr maps for such a conversion is a better way.

  • @gast128
    @gast128 3 роки тому

    Linear search in small contagious memory containers is often faster than binary search with the disclaimer that the lookup is also dependent on the cost of the compare function.

  • @keith77mn77
    @keith77mn77 4 роки тому +5

    Jason, these recent videos are correlating to your game hack stream sessions, I notice. I didn't like that switch conversion from the sf::joystick::axis enum to string_views, and did some compiler explorer experimentation using a static constexpr array in it's place. I found the compiled code was cleaner, with less jump instructions. I hope you'll do another stream on your game soon, as it's really inspiring to watch your whole method from concept to more polished code, and was fun to watch.

  • @jcer6399
    @jcer6399 3 роки тому +1

    Just a note for other deep-thinking programmers: Please note that the benchmark used to measure the speed brings an assumption that colors are searched randomly. If your use-case is e.g. that "black" is in 99% of cases, the results may be different and proportions between "algorithms" not so big (because of CPU jump prediction). Maybe not exactly, but I would say the smaller is the list of values in the map the bigger uncertainty in optimizations.

  • @ignacionr
    @ignacionr 3 роки тому

    Great stuff!

  • @27harishvk
    @27harishvk 4 роки тому

    Thank you Jason great help. Can you include some video on C++ parsers also.. for e.g in my code i want to check when-ever x() method is called within same block y() is also called.. Is there a way for us to validate c++ code?. thanks

  • @tatoute1
    @tatoute1 2 роки тому

    What you need to compare is your Map as constexpr and non constexpr. std::map will be probably slow as a constexpr, except we cannot have a constexpr dynamic allocated object in use at exec time, with the exception that the compiler may know the specs of std::map and deduce the same jump code. But I highly doubt of that.

  • @ukaszdrozdz6200
    @ukaszdrozdz6200 4 роки тому +3

    Jason you forgot to show how passing string_view by reference would impact your solution ;)

  • @fcolecumberri
    @fcolecumberri 3 роки тому

    You said that one of the reasons std::map does not optimize well is because of the heap allocations (the use of "new" for each node) but thats not on the structure, thats on the allocator, I wonder if gcc could optimize that better with a stack allocator and how that would compare with clang and gcc

  • @nextlifeonearth
    @nextlifeonearth 2 роки тому +3

    When I changed the find_if to a raw loop in Clang 10 the speed went way up for the Clang version too. Apparently Clang can't resolve find_if with a constexpr compile time.

    • @nextlifeonearth
      @nextlifeonearth 2 роки тому

      When comparing compile times with the raw loop, clang still beat GCC's std::map version.
      Just don't use find_if in a constexpr in clang and you're golden.

  • @AndrewHelgeCox
    @AndrewHelgeCox 3 роки тому

    I'm thinking separate key and value arrays so you don't pollute caches with values that are skipped over in the linear search. I'm at 2:45 And assuming this code gets executed, not boiled away at compile time.

  • @01296501923654
    @01296501923654 4 роки тому

    What's meant by putting these constexpr constructs behind a compilation firewall? Is it about preventing them being compiled in every unit that includes them?

  • @platin2148
    @platin2148 4 роки тому

    Could one not do string interning with that too?

  • @jakearkinstall5313
    @jakearkinstall5313 3 роки тому

    You can use a very similar technique to get constexpr dictionaries, I.e. Maps of the same key type to DIFFERENT value types. Use a tuple instead of an array.
    But what about the return value? You could return a variant if you hate performance, but the option I go for is to provide an invoke(key, callback) method that finds the key and invokes the callback with the value if it is invocable with the value's type. If it isn't, or if the key can't be found, it returns an error code.

    • @jakearkinstall5313
      @jakearkinstall5313 3 роки тому

      I use this approach at work. We have a constexpr dictionary from an asset's name to its repective pipeline. So if we have an event from the feeds that represents, e.g. a trade or quote on a stock, it goes through that stock's specific pipeline. That means that each stock could have e.g. different neural network layouts, different feature generators, different risk setups, etc, and it's all there baked on the (increased size) stack without runtime polymorphism getting in the way.
      Its not a technique that's useful for everybody, especially if you want runtime configurability, but if you're as happy baking it all in at compile time as I am, it's a performance DREAM.

  • @AlexeiStrots
    @AlexeiStrots 3 роки тому +1

    Will performance be different for unordered_map?

  • @larssamsung3033
    @larssamsung3033 3 роки тому +1

    This optimization works on very small maps (for bigger maps, the O(n) vs O(log n) argument kicks in). In that case, you likely made something that takes like 0.1% of your runtime 10 times faster, lowering your total runtime to 99.99% - a 0.01% improvement! Large gains might be found elsewhere.

    • @nextlifeonearth
      @nextlifeonearth 2 роки тому +2

      If you do this for everything, it will add up. It really is the little things sometimes.

    • @ChrisWijtmans
      @ChrisWijtmans Рік тому

      what if your objects are mostly small maps ;)

  • @darshanpai4194
    @darshanpai4194 2 роки тому

    That was a brilliant episode. I have never used Quick Bench so it will be good to try. One question in the benchmark section, the flat map and the std::map are both const auto. I was wondering does doing a 'static const auto' change the statistics? Because I would assume doing that would preclude generating the maps at every run of the function.

    • @cppweekly
      @cppweekly  2 роки тому

      That should not change things too much, because the map is outside of the measured for loop. There is a link at the bottom of the video description and you can try it out for yourself.

  • @fredhair
    @fredhair 4 роки тому +1

    Very, very cool. Would be interested to see a comparison of binary size though I guess it would be negligible in such a small sample.

  • @shiqiliu1233
    @shiqiliu1233 3 роки тому

    I would make this color string Int_8 Enum and use simple array, and the enum number is the index of the arrary. I think this would be faster

  • @squelchedotter
    @squelchedotter 4 роки тому +3

    So... algorithm has a std::binary_search that is constexpr too, afaict. So (especially if there was a constexpr std::hash) could you have the best of both worlds?

    • @huixie2485
      @huixie2485 4 роки тому +1

      WeNeedMoreFarads For the data size of this example, there is no chance that binary search can beat linear search, because of high the branch prediction rate of the linear search.

    • @squelchedotter
      @squelchedotter 4 роки тому +1

      @@huixie2485 right. So at the larger scale.

  • @francescomagliocco7086
    @francescomagliocco7086 14 днів тому

    Been watching your videos for a while now, and I'm just curious... And maybe this is just because I'm watching them on my phone but.. Why are the videos such low resolution? It's hard to read the code.
    Could also please make a video more on string_view, in detail? I know there is atleast one, but it didn't go into that much detail. I'm just having a hard time trying to grasp when to use them and how exactly to use them.

  • @NoNameAtAll2
    @NoNameAtAll2 3 роки тому

    shouldn't static constexpr be constinit?

  • @ecosta
    @ecosta 3 роки тому +1

    Clang outputs the same result as GCC if the Map class is in an anonymous namespace. I did some tests and Clang seems to avoid generating duplicate code. Without an anonymous ns, the compiler has to output the Map::at anyway, so Clang favours to use it. With an anonymous ns, Clang don't need to output Map::at, so it decides to optimise the usage. Interestingly, if you have Map in an anonymous ns and you duplicate color_value/lookup_value, then the optimisation is not done.

    • @siggimund
      @siggimund 3 роки тому

      Quite a valuable observation there, thanks.
      I guess it would produce the same results with Quick C++ benchmark as gcc?!!!
      I'll try it out later but now I have to go back to my backlog binge watching of C++ Weekly :-)

  • @ashtum
    @ashtum 4 роки тому

    Does anyone know why GCC didn't optimize non-constexpr version?

    • @nmmm2000
      @nmmm2000 4 роки тому

      I believe it will, if it was declared locally + const.

  • @Lord2225
    @Lord2225 3 роки тому

    O3 is sooo wild.

  • @sephirostoy
    @sephirostoy 4 роки тому +9

    What about MSVC?

    • @TonniTielens
      @TonniTielens 2 роки тому

      I was curious about this too. Quickly used this example in MSVC (Visual Studio 2022 Version 17.0.6) myself, and don't notice any difference between std::map and the constexpr Map.

  • @xealit
    @xealit 7 місяців тому

    decent metaprogramming in c++ gets closer and closer

  • @Reneg973
    @Reneg973 26 днів тому

    Is there a way to sort that array at compile time to be able to use lower_bound for lookup?

    • @cppweekly
      @cppweekly  15 днів тому

      Yes, with C++20's constexpr std::sort you definitely could. This is a conversation I have with students when we go over constexpr possibilities.

  • @wojtekburzynski654
    @wojtekburzynski654 4 роки тому +8

    I wonder how std::unordered_map would compare?

    • @bernardosulzbach
      @bernardosulzbach 4 роки тому +4

      Same. Given that order was not required for the benchmarks, I think that not including std::unordered_map is leaving out a very promising candidate.

    • @mapron1
      @mapron1 4 роки тому

      @@bernardosulzbach Promising? RU serious? for ~10 variants it should be always slower than regular map. I'd only consider it if we talking about hundreds and thousands items. btw, i think we have huge potential of constexpr or even continit hash tables with perfect hashes. (like in gpref)

    • @Marlonbr1
      @Marlonbr1 4 роки тому

      Yes, good one! I'd also like to know how unordered_map compares!

  • @carstenhansen3979
    @carstenhansen3979 4 роки тому +1

    9:47 why is it using 32-bit for BLAC and not 64-bit for BLACK?

    • @riccetn
      @riccetn 4 роки тому

      Because BLACK is 40-bit not 64-bit and the 24 bits next to it in memory could be anything or even outside accessible memory.

    • @skyeplus
      @skyeplus 4 роки тому

      Because it would blow past the end. You've got B-L-A-C--K-X-X-X, where X-X-X could be allocated for the next value or go beyond allocated space. So it's instead 4 bytes (32-bit) comparison, followed by 1 byte comparison.
      You could devise a data structure aligned to 64 bits, though, padding the remainder with zeroes. The search would be faster, because you're using more data at once. Even more speedup could be gained from using SSE. Standard library functions like memcmp actually use it under the hood if it's available. The fastest JSON parser use SSE combined with some bit magic to search for close quote mark in string literals as well as other things.

    • @VioletGiraffe
      @VioletGiraffe 4 роки тому

      ​@@skyeplus, it could load the 5 bytes of "black" into a 64-bit register, clear the unused bits, and then compare to the corresponding 64-bit constant. One fewer conditional branch.

    • @jcsahnwaldt
      @jcsahnwaldt 3 роки тому

      @@VioletGiraffe But as skyeplus said, the remaining three bytes could be outside allocated space, i.e. on the next memory page. If the CPU tried to load them, it might get a page fault.
      I guess to allow this 64-bit load optimization, the compiler could store the strings in static memory in 8-byte chunks (as skyeplus mentioned), or it could use other tricks to guarantee it's ok to read the remaining three bytes. Maybe some day someone will add such an optimization heuristic to GCC. But until then: Isn't it amazing what GCC can do? It transforms a linear search into code that is super optimized for these specific values! :-)

  • @RomanOrekhov
    @RomanOrekhov 4 роки тому

    Strangely enough, the following version produces slower results (although shorter assembly):
    constexpr Value operator[](const Key &key) const {
    const auto itr =
    std::find_if(begin(data), end(data),
    [&key](const auto &v) { return v.first == key; });
    return itr->second; // no check since we are sure we will find result
    }

  • @stevenvdschootgmail
    @stevenvdschootgmail 4 роки тому +2

    Does anyone know why gcc jumps to .L3 at line 90? The only way to reach line 90 would be when [RSI] matches "calb", so it seems obvious that the result of any comparison of [RSI] is already known at this point. Is gcc missing an optimization opportunity here, or am I missing the reason why jumping immediately to .L23 would not be valid here?

    • @Xilefian
      @Xilefian 4 роки тому

      Yeah seems like it's missed an optimisation that's obvious to humans

    • @duncanfrot4318
      @duncanfrot4318 4 роки тому

      Not sure what you mean. You need to check the full string is equal to "black" line 90 checks the final letter 'k'. You can't skip any of the steps as what happens if you passed it "blacl"? You would want it to throw.

    • @Xilefian
      @Xilefian 4 роки тому +2

      @@duncanfrot4318 double check. On the branch that passes 'blac' it checks 'k', but then if 'k' fails then it jumps back to check the other 5 letter keys, which is pointless because we've already matched 'blac' and this is a constexpr map, we know there's no other key starting with 'blac', so it's wasting cycles checking the other 5 letter keys

    • @duncanfrot4318
      @duncanfrot4318 4 роки тому

      @@Xilefian ah my bad. Guess that must be left over from the linear search.

  • @locomotivere4497
    @locomotivere4497 4 роки тому

    streamer life simulator
    on steam
    get it?

  • @EmbeddedSorcery
    @EmbeddedSorcery 3 роки тому

    How would you make this without the exception? If I say at() is "constexpr bool at( const Key &key, Value &val) const" then val is an out parameter, and it wouldn't be constexpr right? I tend to use bool to indicate something failed rather than deal with exceptions on an embedded platform.

  • @MatthiasMoulin
    @MatthiasMoulin 4 роки тому

    First time that I noticed the std::string_view literal suffix, sv. Is there any benefit in using this vs the implicit std::string_view constructor?

    • @LEpigeon888
      @LEpigeon888 4 роки тому

      @@cppweekly It's also the only noexcept constructor.

  • @Wulfcry
    @Wulfcry 3 роки тому

    3:40 what is Con-stex-pur.

    • @gutoguto0873
      @gutoguto0873 9 місяців тому

      It's constexpr

    • @Wulfcry
      @Wulfcry 9 місяців тому

      @@gutoguto0873 I remember this, It was a joke for the pronunciation in this video. But thanks anyway: -)

    • @gutoguto0873
      @gutoguto0873 9 місяців тому

      Aww thanks:')@@Wulfcry

    • @stefanbruns6209
      @stefanbruns6209 4 місяці тому

      We all need a little bit of cat content from time to time ...

  • @SamTheSciencerAtheist
    @SamTheSciencerAtheist 4 роки тому

    Can this be made to work on C++17?

    • @modeco80
      @modeco80 4 роки тому

      It should work perfectly fine with C++17*
      * at least on clang 10. although if it works in clang it will probably work on even msvc

  • @nmmm2000
    @nmmm2000 4 роки тому

    why you did not do the Map to be global variable?

  • @_very
    @_very 4 роки тому +8

    While this is remarkable, it seems like we must write code in a very unintuitive way to get the best performance.

    • @squelchedotter
      @squelchedotter 4 роки тому +2

      This is kind of an unfortunate side effect of C++'s bad defaults. Languages that have const/constexpr by default have a big advantage there.

    • @Psy45Kai
      @Psy45Kai 4 роки тому +2

      @Simon Farre that const char* problem is easily solved by string_view. A constexpr string_view object will compile away BUT you can query the strings size

    • @abdwyer
      @abdwyer 4 роки тому

      I really wish the entire constexpr mechanism were quite different, shaped more like Sean Baxter's Circle compiler extension of C++. Then, expressing compile time processing and data would be very explicit and IMO vastly easier to reason about. Actually, speaking of fmt library, Circle contains a Python f-string like formatter.

    • @squelchedotter
      @squelchedotter 4 роки тому

      @Simon Farre rust does it to some degree (e.g. static variables, array sizes) but functions still need to be explicitly annotated as constexpr. Zig goes further and makes all global initializations and functions implicitly constexpr if they can be.
      They both support constexpr str.len of course

    • @VioletGiraffe
      @VioletGiraffe 4 роки тому +1

      I don't agree that it is unintuitive. Ecerything here looks quite intuitive and very simple to me.

  • @DixaGames
    @DixaGames 4 роки тому

    Not bad optimization but it's one of the cases that compiler cannot beat human crafted asm. It could generate a jump table (or a lut with 2 indirects) by string length and then further break down with multiple bytes loaded in 32/64/128/256/etc bit registers.(you always need to pick register size that can fit the largest string to cmp).
    (I'm referring to non constexpr generated code)

  • @danmorris4904
    @danmorris4904 4 роки тому +1

    like

  • @valizeth4073
    @valizeth4073 4 роки тому

    Why would you add the nodiscard attribute tho?

    • @beachedwhale
      @beachedwhale 4 роки тому +5

      It's almost always a programmer error to ignore the result of a lookup

    • @VioletGiraffe
      @VioletGiraffe 4 роки тому +1

      Why wouldn't you?

  • @therealchonk
    @therealchonk 4 роки тому +1

    Jason the VAT/Mwst is currently 16% instead of 19% because of COVID-19.

  • @dagbruck
    @dagbruck 4 роки тому +1

    Pardon me, but isn’t this example a tad over-engineered? A plain constexpr POD of data and lookup with find_if is even simpler and will generate good code.

    • @LEpigeon888
      @LEpigeon888 4 роки тому +3

      It's already just a find_if over an std::array, how it's over engineered ? How would you make it simpler ?

  • @mocloun3709
    @mocloun3709 4 роки тому +1

    The solution may be fast in runtime but it is far away from a clean and readable code.