@23:00 This version is cleaner, but requires a memory allocation. Imagine a case where a, b and c are huge and can be pre-allocated and reused for each call. The former version of arr_add would be more appropriate.
The reason why the compiler didn't vectorize the simple addition loops is because of aliasing issue. It cannot prove that arrays a, b and c don't alias. To workaround it you can use restrict keyword. in Clang you should use the builtin: __builtin_assume_separate_storage.
@40:00 You meant "for (int i=0; i
@23:00 This version is cleaner, but requires a memory allocation. Imagine a case where a, b and c are huge and can be pre-allocated and reused for each call. The former version of arr_add would be more appropriate.
The reason why the compiler didn't vectorize the simple addition loops is because of aliasing issue. It cannot prove that arrays a, b and c don't alias. To workaround it you can use restrict keyword. in Clang you should use the builtin: __builtin_assume_separate_storage.