I don't understand why, in the masking intro slide (at 22:42), the author says that the following has no branches: for (int i = 0; i < N; i++) s += (a[i] < 50 ? a[i] : 0); That's a ternary operation, which branches between left and right expressions. What am I missing?
@@az09letters92 Is that because the compiler can prove that executing both sides has no side-effects? Because if the left or right were expressions that could have side effects, then it would be a short-circuiting branch, correct?
they are, when you give the -march= argument, otherwise the compiler doesn't know which instruction sets are allowed and will fall back to a default (usually x86-64 without avx)
great video. Thank very much for your lightening example and insightful explanation!
I don't understand why, in the masking intro slide (at 22:42), the author says that the following has no branches:
for (int i = 0; i < N; i++)
s += (a[i] < 50 ? a[i] : 0);
That's a ternary operation, which branches between left and right expressions. What am I missing?
That will execute both options and disregard (mask) incorrect ones. Counterintuitively this yields a massive speedup! No branches.
@@az09letters92 Is that because the compiler can prove that executing both sides has no side-effects? Because if the left or right were expressions that could have side effects, then it would be a short-circuiting branch, correct?
I don't understand why these architecture specific instructions are not recognized directly by gcc on O3.
they are, when you give the -march= argument, otherwise the compiler doesn't know which instruction sets are allowed and will fall back to a default (usually x86-64 without avx)
Thanks very appreciated. Especially the examples in C. Is this directky compatible in Cython ?
The intrinsics i mean
Hard to understand English and unpleasantly small text...