tl;dr: Breaking code into small functions each with a distinct purpose(many embedded C++ coding styles require that) can make a difference in performance!
It's strange that you didn't use an example of creating the array outside of the branches, thus creating a single array, filling it based on option, but still returning the single instance of the named value. I bring this up because this is something you've touched on at least once in the past concerning RVO or NRVO, though previously you used std::string. auto get_data( int value, bool option ){ auto result = std::array{}; if( option ) for( auto& item : result ){ item = value++; } else for( auto& item : result ){ item = value--; } return result; } This seems to be the cleanest way to code the issue. It sticks with the idea of using a single return value and I believe it should use NRVO. From what I understand, you can't simply call it RVO, even in your example, because you aren't returning an rvalue, it's a named temporary so I think it's returning an lvalue. The way I understand RVO is returning the result of an expression ( return 3+4; ) or ( return SomeClass{}; )
What if you had only one std::array at the top of the function, instead of having one defined in each branch? Would that work, given they are the same size?
... and also had the condition just on the inc/dec aspect. Why was gcc unable to do rvo on the second branch when it did it for the first! First come first served? :)
@@davidlowndes737yeah, really odd behavior. When I write code, I usually have something like I described, only one definition for the variable being returned. I just assumed RVO would work in that case, but never verified it. It also looks nicer with less code repetition.
The link to the Compiler Explorer example is on the "episode details" page, in the description. I tried it and, indeed, the problem goes away when you define the returned array outside of the if statement. My understanding is that to use NRVO in both branches, GCC would need to check that both variables aren't in scope at the same time (because the return value's location needs to be known beforehand). Without knowledge of GCC internals, it's hard to say whether implementing this would be easy.
@@bryce.ferenczi makes sense, but I wonder if he was aware of this. Nonetheless, it would be great to show another alternative to having two separate functions to achieve RVO in both branches
How is the compiler able to apply RVO to the case with two functions but not to the non-RVO example when in both cases it can be argued that the location of the to be returned value is in a different place. I understand that if the result lives outside of the branches and is the only return path that RVO is simple to apply but when two separate functions are called based on a condition does that not make it harder for the compiler to apply RVO?
unnamed copy-elision is non-optional in get_data_with_rvo. that function is poorly-named, it's NOT an "optimization" it's how the compiler MUST work (post c++17).
I'm a little disappointed that the warning is not turned on by default. Also, just for completeness, there's the C way of doing it - pass the array by non const pointer(or reference). Before move semantics was a thing, this was the only optimal way to do it, so there's probably a lot of legacy code that does it.
I would like to add that's it not just "legacy code". You still see it in new code where folks really care about performance. It's all over the unreal engine and godot code bases (last 2 big c++ code bases I've read) and new commits are coming in all the time that still do it this way. I generally follow suit in whatever code base I'm in, but I still prefer the c way, feels more explicit and I don't have to guess if the compiler will do something dumb or not.
@@brennennen1761 Yes, recently I had a comment on a pull request "passing array by value?". When I said rvo would take care of it and that I have actually checked it, I still had to change it to non const reference parameter to be able to merge the change.
Realistically it would be really really nice if all compilers had easy ways to turn all the optimization-cant-happen warnings Some of us are crazy enough to already have compilers like MSVC emitting *every* warning
Since Clang does it, then it's doable, it's just that GCC and possibly MSVC have overlooked it or may have been an oversight of the standard or maybe a misinterpretation of the standard. It also could be a way for the standards committee to not overburden compiler makers to have to take into account all possible return paths. Clang might do it for this toy example, but would it handle more than two? Could it handle it if the function was 50 lines long? 100?
For creating small composable functions, would creating lambdas inside the scope of the branching function achieve the same goal? I tend to dislike littering a file with a bunch of functions that are only for internal use and only used once.
I made test with local bench. And with lambda is even better. With lambda is 1.6 faster with functions 1.5 code: std::array get_data_with_rvo(int input_value, bool option) { auto lambda_1 = [](int input_value) { std::array result{}; int value = input_value; for (auto &elem : result) { elem = value; ++value; } return result; }; auto lambda_2 = [](int input_value) { std::array result{}; int value = input_value; for (auto &elem : result) { elem = value; --value; } return result; }; if (option) { return lambda_1(input_value); } else { return lambda_2(input_value); } }
@revcorey12 thats quite interesting, I wonder if the compiler is able to guarantee inlining of the lambda whereas it couldn't with the function call then
To ensure that “option” isn’t optimized away when using compiler explorer, I like to make it volatile. It allows me to run the executable on the website.
Meanwhile, there was a semi viral post this week about how a react prop with like 10 pretty plain js fields was slowing down rendering of github from 120fps to 30fps I miss a reasonably fast language like c++
This is the reason why I went down the Microsoft SAL rabbit hole so deeply ten years ago - I wanted the guaranteed performance without the buggy semantics, and it worked well for that. Disappointed that it couldn't be open sourced.
a little lazy with the term "optimization" there. when the standard requires behavior (such as unnamed copy-elision post-c++17), it's not an "optimization", it's minimal compliant behavior.
It does now I think. I used the last EAP version on wayland and it was alright. Now that the EAP is no more I've removed it because clion is a bit bloated for my taste.
I tried adding Wnrvo to target_compile_options in my CMakeLists.txt and broke the build with warning: Wnrvo: linker input file unused because linking not done and error: Wnrvo: linker input file not found: No such file or directory. Any ideas?
Finding a good example is always difficult :) std::array get_data(int input_value, bool option) { auto result = std::array{}; auto const mod = option ? 1 : -1; for (auto& elem : result) { elem = input_value; input_value += mod; } return result; }
Why is the second branch has memcpy? Because the compiler cannot assume optimizations? I assume there's a tiny detail hidden somewhere. Also I've done benchmarking both on C++ and Javscript, I always compared JS functions in thousands of MS. In C++ if I get above 1 second I'll be like "Damn, where can I optimize". And that's for not real-time of course.
Sorry I’m such a beginner and I know they’re different from each other I just need a explanation if you were to do this in c99/c17 would it differ from just rolling out more code to achieve the same thing
tl;dr: Breaking code into small functions each with a distinct purpose(many embedded C++ coding styles require that) can make a difference in performance!
It's strange that you didn't use an example of creating the array outside of the branches, thus creating a single array, filling it based on option, but still returning the single instance of the named value.
I bring this up because this is something you've touched on at least once in the past concerning RVO or NRVO, though previously you used std::string.
auto get_data( int value, bool option ){
auto result = std::array{};
if( option )
for( auto& item : result ){ item = value++; }
else
for( auto& item : result ){ item = value--; }
return result;
}
This seems to be the cleanest way to code the issue. It sticks with the idea of using a single return value and I believe it should use NRVO. From what I understand, you can't simply call it RVO, even in your example, because you aren't returning an rvalue, it's a named temporary so I think it's returning an lvalue.
The way I understand RVO is returning the result of an expression ( return 3+4; ) or ( return SomeClass{}; )
Nice. I was thinking the same thing, but every time I try to put code in a comment it gets deleted or shadowed. Now I don't have to.
Another case of "don't guess, benchmark".
Time is money, friend!
What if you had only one std::array at the top of the function, instead of having one defined in each branch? Would that work, given they are the same size?
... and also had the condition just on the inc/dec aspect.
Why was gcc unable to do rvo on the second branch when it did it for the first! First come first served? :)
@@davidlowndes737yeah, really odd behavior.
When I write code, I usually have something like I described, only one definition for the variable being returned. I just assumed RVO would work in that case, but never verified it.
It also looks nicer with less code repetition.
This is just a contrived example to make a point. Compiler should do the thing but it is not.
The link to the Compiler Explorer example is on the "episode details" page, in the description. I tried it and, indeed, the problem goes away when you define the returned array outside of the if statement. My understanding is that to use NRVO in both branches, GCC would need to check that both variables aren't in scope at the same time (because the return value's location needs to be known beforehand). Without knowledge of GCC internals, it's hard to say whether implementing this would be easy.
@@bryce.ferenczi makes sense, but I wonder if he was aware of this. Nonetheless, it would be great to show another alternative to having two separate functions to achieve RVO in both branches
Wow! Just three days ago I was like "I wish Jason has a video on RVO" . Thank you so much Jason! You're awesome!
very useful episode, thanks Jason! (All episodes are useful, some episodes are more useful than others.)
How is the compiler able to apply RVO to the case with two functions but not to the non-RVO example when in both cases it can be argued that the location of the to be returned value is in a different place.
I understand that if the result lives outside of the branches and is the only return path that RVO is simple to apply but when two separate functions are called based on a condition does that not make it harder for the compiler to apply RVO?
Could probably call it a missed optimization? Particularly in the cases where there is tightly limited scope of the variables being returned.
unnamed copy-elision is non-optional in get_data_with_rvo. that function is poorly-named, it's NOT an "optimization" it's how the compiler MUST work (post c++17).
Hi Jason! could you please do a video on `decltype(auto) vs auto&&` as function return types?
In case he doesn't answer, decltype will preserve every part of the type, reference type, const and so on, while just plain auto strips all of that.
You are welcome to add your request to the list I maintain here so I don't lose people's suggestions: github.com/lefticus/cpp_weekly/issues/
A code pattern I would use: declare result array once, then the if with loops, finally one return at the end after the if. What happens there?
Hello from code::dive😊
I'm a little disappointed that the warning is not turned on by default. Also, just for completeness, there's the C way of doing it - pass the array by non const pointer(or reference). Before move semantics was a thing, this was the only optimal way to do it, so there's probably a lot of legacy code that does it.
I would like to add that's it not just "legacy code". You still see it in new code where folks really care about performance. It's all over the unreal engine and godot code bases (last 2 big c++ code bases I've read) and new commits are coming in all the time that still do it this way. I generally follow suit in whatever code base I'm in, but I still prefer the c way, feels more explicit and I don't have to guess if the compiler will do something dumb or not.
@@brennennen1761 Yes, recently I had a comment on a pull request "passing array by value?". When I said rvo would take care of it and that I have actually checked it, I still had to change it to non const reference parameter to be able to merge the change.
Realistically it would be really really nice if all compilers had easy ways to turn all the optimization-cant-happen warnings
Some of us are crazy enough to already have compilers like MSVC emitting *every* warning
But why? Why doesn't RVO happen in both branches? It makes zero sense, if the compiler is this unreliable than someone must have effed up somewhere.
Since Clang does it, then it's doable, it's just that GCC and possibly MSVC have overlooked it or may have been an oversight of the standard or maybe a misinterpretation of the standard.
It also could be a way for the standards committee to not overburden compiler makers to have to take into account all possible return paths. Clang might do it for this toy example, but would it handle more than two? Could it handle it if the function was 50 lines long? 100?
For creating small composable functions, would creating lambdas inside the scope of the branching function achieve the same goal?
I tend to dislike littering a file with a bunch of functions that are only for internal use and only used once.
I made test with local bench. And with lambda is even better. With lambda is 1.6 faster with functions 1.5
code:
std::array get_data_with_rvo(int input_value, bool option)
{
auto lambda_1 = [](int input_value) {
std::array result{};
int value = input_value;
for (auto &elem : result) { elem = value; ++value; }
return result;
};
auto lambda_2 = [](int input_value) {
std::array result{};
int value = input_value;
for (auto &elem : result) { elem = value; --value; }
return result;
};
if (option) {
return lambda_1(input_value);
} else {
return lambda_2(input_value);
}
}
@revcorey12 thats quite interesting, I wonder if the compiler is able to guarantee inlining of the lambda whereas it couldn't with the function call then
@@binzinzin9x you are probably right. Inline should take a place but in the end compiler decides.
@@revcorey12 might be related yeah, compilers tend to inline lambdas.
I wish there was a link to the code with all compiler flags that I can't see in the video . Thanks Jason.
To ensure that “option” isn’t optimized away when using compiler explorer, I like to make it volatile. It allows me to run the executable on the website.
Meanwhile, there was a semi viral post this week about how a react prop with like 10 pretty plain js fields was slowing down rendering of github from 120fps to 30fps
I miss a reasonably fast language like c++
and why not just:
std::array get_data(int input_value, bool option)
{
std::array result{};
int value = input_value;
if (option) {
for (auto &elem : result) { elem = value; ++value; }
} else {
for (auto &elem : result) { elem = value; --value; }
}
return result;
}
I know out paramaters arent preferred, but if you used them you wouldnt have to worry about whether or not the compiler did what you wanted.
This is the reason why I went down the Microsoft SAL rabbit hole so deeply ten years ago - I wanted the guaranteed performance without the buggy semantics, and it worked well for that. Disappointed that it couldn't be open sourced.
Now just need to cover NRVO
a little lazy with the term "optimization" there. when the standard requires behavior (such as unnamed copy-elision post-c++17), it's not an "optimization", it's minimal compliant behavior.
I do like clion, if only it supported wayland
It does now I think. I used the last EAP version on wayland and it was alright. Now that the EAP is no more I've removed it because clion is a bit bloated for my taste.
I've read once that declaring return values of functions being const could impeach rvo to happen.
I tried adding Wnrvo to target_compile_options in my CMakeLists.txt and broke the build with warning: Wnrvo: linker input file unused because linking not done and error: Wnrvo: linker input file not found: No such file or directory. Any ideas?
looks like you forgot the `-`?
@@cppweekly I tried the '-' as well. (error: unrecognized command-line option ‘-Wnrvo’) g++ too old (11).
Finding a good example is always difficult :)
std::array get_data(int input_value, bool option)
{
auto result = std::array{};
auto const mod = option ? 1 : -1;
for (auto& elem : result) { elem = input_value; input_value += mod; }
return result;
}
Why is the second branch has memcpy? Because the compiler cannot assume optimizations?
I assume there's a tiny detail hidden somewhere.
Also I've done benchmarking both on C++ and Javscript, I always compared JS functions in thousands of MS.
In C++ if I get above 1 second I'll be like "Damn, where can I optimize".
And that's for not real-time of course.
Sorry I’m such a beginner and I know they’re different from each other I just need a explanation if you were to do this in c99/c17 would it differ from just rolling out more code to achieve the same thing