Another version to fill an array: template struct LookUpTable { constexpr LookUpTable() : arr() { int i = 0; unsigned int v = 0; for (; i < N; ++i) { v += (i == 13 || i == 29 || i == 41 || i == 53 || i == 71 || i == 83); arr[i] = v; } } std::array arr; }; constexpr auto look_up_table = LookUpTable();
I am surprised these videos have only hundreds of views after years. I watched the whole series and I think they're great for practitioners at any level of experience. Thanks for making these.
You get dramatically better performance by using uint8[] for buckets because the whole lookup table would likely fit on the L1 cache. With about 100 entries, the uint8 would only be about 100 bytes. As int[], it would use up about 400 bytes, or four cache lines in many systems. In two-way set associative caches, this would use up an entire set, leading to both capacity misses and associativity misses, as well as adding a lot of burden to the memory system.
This is a bit of a contrived example. In most cases, a branch cannot be replaced by a table lookup, because the impact of the branch is not limited to just generating a scalar value. It may do a different set of actions altogether.
table lookups can return addresses of executable memory, or in higher level systems programming could return function pointers. In languages with coroutines it could return a coroutine to switch to
@@charactername263 branching on an executable address stored into an array is still branching: it's actually even worse than an `if` for the control flow of the cpu: it is even less likely to guess what's the correct next set of instructions, which is the whole point.
@@ennio5763 there is no branch, that is the point of the lookup table. If a branch is not predictable, then the overhead of something like switching coroutine or doing a jump can be cheaper than the unpredictable branch. ymmv There's not really any point doing this in code that is not latency critical, and such codebases generally "warm" the hot path anyway.
@@charactername263 the problem is not the branch per say, but rather the unpredictability of the jump address. "table lookups can return addresses of executable memory" doesn't help, because it doesn't make the code flow more predictable.
Here is the complete code of the solution:
```
#include "solution.hpp"
int buckets[100] = {
// 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
};
static size_t mapToBucket(size_t v) {
if (v < (sizeof (buckets) / sizeof (int)))
return buckets[v];
return -1; // let it crash
}
std::array histogram(const std::vector &values) {
std::array retBuckets{0};
for (auto v : values) {
retBuckets[mapToBucket(v)]++;
}
return retBuckets;
}
```
Another version to fill an array:
template
struct LookUpTable {
constexpr LookUpTable() : arr() {
int i = 0;
unsigned int v = 0;
for (; i < N; ++i) {
v += (i == 13 || i == 29 || i == 41 || i == 53 || i == 71 || i == 83);
arr[i] = v;
}
}
std::array arr;
};
constexpr auto look_up_table = LookUpTable();
I am surprised these videos have only hundreds of views after years. I watched the whole series and I think they're great for practitioners at any level of experience. Thanks for making these.
You get dramatically better performance by using uint8[] for buckets because the whole lookup table would likely fit on the L1 cache. With about 100 entries, the uint8 would only be about 100 bytes. As int[], it would use up about 400 bytes, or four cache lines in many systems. In two-way set associative caches, this would use up an entire set, leading to both capacity misses and associativity misses, as well as adding a lot of burden to the memory system.
Agree, it could be a further improvement.
This is a bit of a contrived example.
In most cases, a branch cannot be replaced by a table lookup, because the impact of the branch is not limited to just generating a scalar value. It may do a different set of actions altogether.
table lookups can return addresses of executable memory, or in higher level systems programming could return function pointers. In languages with coroutines it could return a coroutine to switch to
@@charactername263 branching on an executable address stored into an array is still branching: it's actually even worse than an `if` for the control flow of the cpu: it is even less likely to guess what's the correct next set of instructions, which is the whole point.
@@ennio5763 there is no branch, that is the point of the lookup table. If a branch is not predictable, then the overhead of something like switching coroutine or doing a jump can be cheaper than the unpredictable branch. ymmv There's not really any point doing this in code that is not latency critical, and such codebases generally "warm" the hot path anyway.
@@charactername263 the problem is not the branch per say, but rather the unpredictability of the jump address. "table lookups can return addresses of executable memory" doesn't help, because it doesn't make the code flow more predictable.
@@ennio5763 Yes you are right, the executable memory to jump to can be outside of the cache leading to a fault, each technique has downsides :)