C++ Weekly - Ep 456 - RVO + Trivial Types = Faster Code

Поділитися
Вставка
  • Опубліковано 11 гру 2024

КОМЕНТАРІ • 50

  • @literallynull
    @literallynull 16 днів тому +14

    tl;dr: Breaking code into small functions each with a distinct purpose(many embedded C++ coding styles require that) can make a difference in performance!

  • @MrAlbinopapa
    @MrAlbinopapa 16 днів тому +21

    It's strange that you didn't use an example of creating the array outside of the branches, thus creating a single array, filling it based on option, but still returning the single instance of the named value.
    I bring this up because this is something you've touched on at least once in the past concerning RVO or NRVO, though previously you used std::string.
    auto get_data( int value, bool option ){
    auto result = std::array{};
    if( option )
    for( auto& item : result ){ item = value++; }
    else
    for( auto& item : result ){ item = value--; }
    return result;
    }
    This seems to be the cleanest way to code the issue. It sticks with the idea of using a single return value and I believe it should use NRVO. From what I understand, you can't simply call it RVO, even in your example, because you aren't returning an rvalue, it's a named temporary so I think it's returning an lvalue.
    The way I understand RVO is returning the result of an expression ( return 3+4; ) or ( return SomeClass{}; )

    • @anon_y_mousse
      @anon_y_mousse 15 днів тому +1

      Nice. I was thinking the same thing, but every time I try to put code in a comment it gets deleted or shadowed. Now I don't have to.

  • @sealsharp
    @sealsharp 16 днів тому +14

    Another case of "don't guess, benchmark".

  • @MirralisDias
    @MirralisDias 16 днів тому +28

    What if you had only one std::array at the top of the function, instead of having one defined in each branch? Would that work, given they are the same size?

    • @davidlowndes737
      @davidlowndes737 16 днів тому +7

      ... and also had the condition just on the inc/dec aspect.
      Why was gcc unable to do rvo on the second branch when it did it for the first! First come first served? :)

    • @MirralisDias
      @MirralisDias 16 днів тому +6

      ​@@davidlowndes737yeah, really odd behavior.
      When I write code, I usually have something like I described, only one definition for the variable being returned. I just assumed RVO would work in that case, but never verified it.
      It also looks nicer with less code repetition.

    • @bryce.ferenczi
      @bryce.ferenczi 16 днів тому +4

      This is just a contrived example to make a point. Compiler should do the thing but it is not.

    • @X_Baron
      @X_Baron 16 днів тому +6

      The link to the Compiler Explorer example is on the "episode details" page, in the description. I tried it and, indeed, the problem goes away when you define the returned array outside of the if statement. My understanding is that to use NRVO in both branches, GCC would need to check that both variables aren't in scope at the same time (because the return value's location needs to be known beforehand). Without knowledge of GCC internals, it's hard to say whether implementing this would be easy.

    • @MirralisDias
      @MirralisDias 16 днів тому +1

      @@bryce.ferenczi makes sense, but I wonder if he was aware of this. Nonetheless, it would be great to show another alternative to having two separate functions to achieve RVO in both branches

  • @sky_is_the_limit_13
    @sky_is_the_limit_13 16 днів тому

    Wow! Just three days ago I was like "I wish Jason has a video on RVO" . Thank you so much Jason! You're awesome!

  • @treyquattro
    @treyquattro 16 днів тому +1

    very useful episode, thanks Jason! (All episodes are useful, some episodes are more useful than others.)

  • @LewiLewi52
    @LewiLewi52 15 днів тому +3

    How is the compiler able to apply RVO to the case with two functions but not to the non-RVO example when in both cases it can be argued that the location of the to be returned value is in a different place.
    I understand that if the result lives outside of the branches and is the only return path that RVO is simple to apply but when two separate functions are called based on a condition does that not make it harder for the compiler to apply RVO?

    • @cppweekly
      @cppweekly  8 днів тому

      Could probably call it a missed optimization? Particularly in the cases where there is tightly limited scope of the variables being returned.

    • @Spongman
      @Spongman 6 днів тому

      unnamed copy-elision is non-optional in get_data_with_rvo. that function is poorly-named, it's NOT an "optimization" it's how the compiler MUST work (post c++17).

  • @yntfwyk
    @yntfwyk 16 днів тому +4

    Hi Jason! could you please do a video on `decltype(auto) vs auto&&` as function return types?

    • @anon_y_mousse
      @anon_y_mousse 15 днів тому

      In case he doesn't answer, decltype will preserve every part of the type, reference type, const and so on, while just plain auto strips all of that.

    • @cppweekly
      @cppweekly  11 днів тому

      You are welcome to add your request to the list I maintain here so I don't lose people's suggestions: github.com/lefticus/cpp_weekly/issues/

  • @dagbruck
    @dagbruck 16 днів тому +2

    A code pattern I would use: declare result array once, then the if with loops, finally one return at the end after the if. What happens there?

  • @yagamilight2166
    @yagamilight2166 16 днів тому +3

    Hello from code::dive😊

  • @TsvetanDimitrov1976
    @TsvetanDimitrov1976 16 днів тому +6

    I'm a little disappointed that the warning is not turned on by default. Also, just for completeness, there's the C way of doing it - pass the array by non const pointer(or reference). Before move semantics was a thing, this was the only optimal way to do it, so there's probably a lot of legacy code that does it.

    • @brennennen1761
      @brennennen1761 16 днів тому +2

      I would like to add that's it not just "legacy code". You still see it in new code where folks really care about performance. It's all over the unreal engine and godot code bases (last 2 big c++ code bases I've read) and new commits are coming in all the time that still do it this way. I generally follow suit in whatever code base I'm in, but I still prefer the c way, feels more explicit and I don't have to guess if the compiler will do something dumb or not.

    • @TsvetanDimitrov1976
      @TsvetanDimitrov1976 16 днів тому +3

      @@brennennen1761 Yes, recently I had a comment on a pull request "passing array by value?". When I said rvo would take care of it and that I have actually checked it, I still had to change it to non const reference parameter to be able to merge the change.

    • @Ariccio123
      @Ariccio123 16 днів тому +4

      Realistically it would be really really nice if all compilers had easy ways to turn all the optimization-cant-happen warnings
      Some of us are crazy enough to already have compilers like MSVC emitting *every* warning

  • @unclechaelsneckvein
    @unclechaelsneckvein 16 днів тому +6

    But why? Why doesn't RVO happen in both branches? It makes zero sense, if the compiler is this unreliable than someone must have effed up somewhere.

    • @MrAlbinopapa
      @MrAlbinopapa 16 днів тому +3

      Since Clang does it, then it's doable, it's just that GCC and possibly MSVC have overlooked it or may have been an oversight of the standard or maybe a misinterpretation of the standard.
      It also could be a way for the standards committee to not overburden compiler makers to have to take into account all possible return paths. Clang might do it for this toy example, but would it handle more than two? Could it handle it if the function was 50 lines long? 100?

  • @binzinzin9x
    @binzinzin9x 16 днів тому +2

    For creating small composable functions, would creating lambdas inside the scope of the branching function achieve the same goal?
    I tend to dislike littering a file with a bunch of functions that are only for internal use and only used once.

    • @revcorey12
      @revcorey12 14 днів тому

      I made test with local bench. And with lambda is even better. With lambda is 1.6 faster with functions 1.5
      code:
      std::array get_data_with_rvo(int input_value, bool option)
      {
      auto lambda_1 = [](int input_value) {
      std::array result{};
      int value = input_value;
      for (auto &elem : result) { elem = value; ++value; }
      return result;
      };
      auto lambda_2 = [](int input_value) {
      std::array result{};
      int value = input_value;
      for (auto &elem : result) { elem = value; --value; }
      return result;
      };
      if (option) {
      return lambda_1(input_value);
      } else {
      return lambda_2(input_value);
      }
      }

    • @binzinzin9x
      @binzinzin9x 14 днів тому

      @revcorey12 thats quite interesting, I wonder if the compiler is able to guarantee inlining of the lambda whereas it couldn't with the function call then

    • @revcorey12
      @revcorey12 14 днів тому

      @@binzinzin9x you are probably right. Inline should take a place but in the end compiler decides.

    • @cppweekly
      @cppweekly  8 днів тому

      @@revcorey12 might be related yeah, compilers tend to inline lambdas.

  • @sky_is_the_limit_13
    @sky_is_the_limit_13 14 днів тому

    I wish there was a link to the code with all compiler flags that I can't see in the video . Thanks Jason.

  • @zoliv6906
    @zoliv6906 16 днів тому +4

    To ensure that “option” isn’t optimized away when using compiler explorer, I like to make it volatile. It allows me to run the executable on the website.

  • @Ariccio123
    @Ariccio123 16 днів тому +4

    Meanwhile, there was a semi viral post this week about how a react prop with like 10 pretty plain js fields was slowing down rendering of github from 120fps to 30fps
    I miss a reasonably fast language like c++

  • @rockfordone
    @rockfordone 16 днів тому +1

    and why not just:
    std::array get_data(int input_value, bool option)
    {
    std::array result{};
    int value = input_value;
    if (option) {
    for (auto &elem : result) { elem = value; ++value; }
    } else {
    for (auto &elem : result) { elem = value; --value; }
    }
    return result;
    }

  • @oskardeeream1846
    @oskardeeream1846 16 днів тому +3

    I know out paramaters arent preferred, but if you used them you wouldnt have to worry about whether or not the compiler did what you wanted.

    • @Ariccio123
      @Ariccio123 16 днів тому +1

      This is the reason why I went down the Microsoft SAL rabbit hole so deeply ten years ago - I wanted the guaranteed performance without the buggy semantics, and it worked well for that. Disappointed that it couldn't be open sourced.

  • @stephenhowe4107
    @stephenhowe4107 16 днів тому

    Now just need to cover NRVO

  • @Spongman
    @Spongman 6 днів тому +1

    a little lazy with the term "optimization" there. when the standard requires behavior (such as unnamed copy-elision post-c++17), it's not an "optimization", it's minimal compliant behavior.

  • @mercer5888
    @mercer5888 16 днів тому

    I do like clion, if only it supported wayland

    • @valbogda5512
      @valbogda5512 13 днів тому

      It does now I think. I used the last EAP version on wayland and it was alright. Now that the EAP is no more I've removed it because clion is a bit bloated for my taste.

  • @utevroot
    @utevroot 7 днів тому

    I've read once that declaring return values of functions being const could impeach rvo to happen.

  • @garytaverner5930
    @garytaverner5930 12 днів тому

    I tried adding Wnrvo to target_compile_options in my CMakeLists.txt and broke the build with warning: Wnrvo: linker input file unused because linking not done and error: Wnrvo: linker input file not found: No such file or directory. Any ideas?

    • @cppweekly
      @cppweekly  8 днів тому +1

      looks like you forgot the `-`?

    • @garytaverner5930
      @garytaverner5930 8 днів тому

      @@cppweekly I tried the '-' as well. (error: unrecognized command-line option ‘-Wnrvo’) g++ too old (11).

  • @pawello87
    @pawello87 15 днів тому +3

    Finding a good example is always difficult :)
    std::array get_data(int input_value, bool option)
    {
    auto result = std::array{};
    auto const mod = option ? 1 : -1;
    for (auto& elem : result) { elem = input_value; input_value += mod; }
    return result;
    }

  • @TNothingFree
    @TNothingFree 16 днів тому +1

    Why is the second branch has memcpy? Because the compiler cannot assume optimizations?
    I assume there's a tiny detail hidden somewhere.
    Also I've done benchmarking both on C++ and Javscript, I always compared JS functions in thousands of MS.
    In C++ if I get above 1 second I'll be like "Damn, where can I optimize".
    And that's for not real-time of course.

  • @explqicot3293
    @explqicot3293 16 днів тому

    Sorry I’m such a beginner and I know they’re different from each other I just need a explanation if you were to do this in c99/c17 would it differ from just rolling out more code to achieve the same thing