A simpler (albeit less educational) way to implement pow(): template double pow(double x) { // the following does the right thing for even and odd N return pow(x) * pow(x); } template double pow(double x) { return x; } template double pow(double x) { return 1.0; }
I think the trick he's doing is the following: If you want to calculate something like x^(2^k) it makes the compiler to multiply the answer per itself. Starting with x*x and then multiplying the result by itself gives us x^4 and so on. About double's rounding errors (if I'm right) the less operations you make, the more reliable it is so this method will at least be as correct as the other (if not better)
8 років тому
The pow example is comparing apples to oranges: pow sets errno for floating point exceptions. And if you compile with -ffast-math or the equivalent you get just as fast (or faster) code with pow
On my machine, pow(x) is between 5 times (for N~=1000) and 25 times (for N~=10) faster than pow(x, N). -ffast-math doesn't make a performance difference for pow(), but makes pow() roughly another two times faster.
On time 7:51, in the definition of IsAnyChar, there is no such thing in C++11 that requires the compiler to "guess" that we want to read the "type" type in "NoCV"! IMHO he meant: template <typename T> struct NoCV { using type = T; }; template <typename T> struct NoCV<const T> { using type = T; }; template <typename T> using NoCV_t = typename NoCV<T>::type; template <typename T> struct IsAnyChar { constexpr static bool value = IsChar<NoCV_t<T>>::value; };
In C++14 there is helper variable template for std::remove_cv::type named as std::remove_cv_t (most traits classes/structs defined in have their appropriate std::traits_class_name_v and/or std::traits_class_t variable template definitions. This is quite handy because it saves you from having to type std::remove_cv::type whenever you need to remove the const and/or volatile specifier on a given type. For example you can use std::remove_cv_t together with the following short class template in order to determine whether T is a valid character type: #include #include template struct is_anyone_of : std::false_type {}; template struct is_anyone_of : std::is_same {}; template struct is_anyone_of : std::integral_constant {}; template inline constexpr bool is_anyone_of_v = is_anyone_of::value; template struct is_character_type { static constexpr bool value = is_anyone_of_v; }; template inline constexpr bool is_character_type_v = is_character_type::value; using namespace std; int main() { cout
SFINAE @ 50:00: I think it should be "&U::sort" and not "&T::sort". Also, after NULL in the enum clause, there should be two closing parentheses, not just one
For the computing x^N part, I have a shorter solution: template struct Pow { double operator()(double x) const { return Pow()(x) * Pow()(x); } }; template struct Pow { double operator()(double x) const { return x; } }; template struct Pow { double operator()(double x) const { return 1; } }; On my machine, it runs at least 5 times faster than the cmath version with optimization flags.
I think there is a typo in the slide at minute 24:00, x*Pow()(x) has an N/2 instead of (N-1)/2 as previously displayed in the "For odd N we need:" example above
On time 21:50 he says that `x^16` (where `x` is run-time `double` and `16` is compile-time known constant) can be computed faster as `x^8 * x^8` instead of `x*x*x*x*...*x`. It may or may not be true because of rounding issues and numerical stability. Rounding modes (towards zero, even, ...) can be changed at run-time.
Probably compiler converts first expression to the temp=x8; res=temp*temp which has total of 8 multiplication. Since both operands are same but not sure.
@@jcsahnwaldt In theory, it's not portable since implementation of signed ints is platform dependent, but afaik all cpp compilers use two's complements.
Yeah, I stared at that slide for a few minutes before realizing the slide was the problem & not my failure to grok it. His "fast" version of pow2 is also incorrect (despite his assertion). 12 minutes in, and 2 glaring mistakes already... talk aborted.
A simpler (albeit less educational) way to implement pow():
template double pow(double x) {
// the following does the right thing for even and odd N
return pow(x) * pow(x);
}
template double pow(double x) {
return x;
}
template double pow(double x) {
return 1.0;
}
I think the trick he's doing is the following: If you want to calculate something like x^(2^k) it makes the compiler to multiply the answer per itself. Starting with x*x and then multiplying the result by itself gives us x^4 and so on.
About double's rounding errors (if I'm right) the less operations you make, the more reliable it is so this method will at least be as correct as the other (if not better)
The pow example is comparing apples to oranges: pow sets errno for floating point exceptions. And if you compile with -ffast-math or the equivalent you get just as fast (or faster) code with pow
On my machine, pow(x) is between 5 times (for N~=1000) and 25 times (for N~=10) faster than pow(x, N). -ffast-math doesn't make a performance difference for pow(), but makes pow() roughly another two times faster.
On time 7:51, in the definition of IsAnyChar, there is no such thing in C++11 that requires the compiler to "guess" that we want to read the "type" type in "NoCV"! IMHO he meant:
template <typename T>
struct NoCV { using type = T; };
template <typename T>
struct NoCV<const T> { using type = T; };
template <typename T>
using NoCV_t = typename NoCV<T>::type;
template <typename T>
struct IsAnyChar {
constexpr static bool value = IsChar<NoCV_t<T>>::value;
};
In C++14 there is helper variable template for std::remove_cv::type named as std::remove_cv_t (most traits classes/structs defined in have their appropriate std::traits_class_name_v and/or std::traits_class_t variable template definitions. This is quite handy because it saves you from having to type std::remove_cv::type whenever you need to remove the const and/or volatile specifier on a given type. For example you can use std::remove_cv_t together with the following short class template in order to determine whether T is a valid character type:
#include
#include
template
struct is_anyone_of : std::false_type {};
template
struct is_anyone_of : std::is_same {};
template
struct is_anyone_of
: std::integral_constant {};
template
inline constexpr bool is_anyone_of_v = is_anyone_of::value;
template
struct is_character_type {
static constexpr bool value =
is_anyone_of_v;
};
template
inline constexpr bool is_character_type_v = is_character_type::value;
using namespace std;
int main() {
cout
SFINAE @ 50:00: I think it should be "&U::sort" and not "&T::sort". Also, after NULL in the enum clause, there should be two closing parentheses, not just one
For the computing x^N part, I have a shorter solution:
template
struct Pow
{
double operator()(double x) const
{
return Pow()(x) * Pow()(x);
}
};
template
struct Pow
{
double operator()(double x) const
{
return x;
}
};
template
struct Pow
{
double operator()(double x) const
{
return 1;
}
};
On my machine, it runs at least 5 times faster than the cmath version with optimization flags.
I can imagine this being super slow to compile for really big N, because it has to generate logN template instantiations.
I think there is a typo in the slide at minute 24:00, x*Pow()(x) has an N/2 instead of (N-1)/2 as previously displayed in the "For odd N we need:" example above
It doesn't matter. If N > 0 is an odd int, N/2 == (N-1)/2. For example, 5/2 == 4/2 == 2. Remainder is discarded.
On time 21:50 he says that `x^16` (where `x` is run-time `double` and `16` is compile-time known constant) can be computed faster as `x^8 * x^8` instead of `x*x*x*x*...*x`.
It may or may not be true because of rounding issues and numerical stability. Rounding modes (towards zero, even, ...) can be changed at run-time.
VW joke was awesome..
how is x8*x8 better than x16? The generated codes are both 16 xs multiply.
Probably compiler converts first expression to the
temp=x8; res=temp*temp
which has total of 8 multiplication. Since both operands are same but not sure.
On time 34:25, he shifts T(1) (possible int) and does bit mask operations with it. Is it legal to do shifts and bit-wise operations with signed types?
Yes, it's legal.
@@jcsahnwaldt In theory, it's not portable since implementation of signed ints is platform dependent, but afaik all cpp compilers use two's complements.
Mammoth-old shit techniques. You don't have to do almost all of what he does. (Ignoring his wrong code in pow section.)
log2(5) = 2? Compute log2(5) using the algorithm shown at 11:51
The simpler version of Log2 he showed computes log2(5) = 3
Yeah, I stared at that slide for a few minutes before realizing the slide was the problem & not my failure to grok it. His "fast" version of pow2 is also incorrect (despite his assertion). 12 minutes in, and 2 glaring mistakes already... talk aborted.