Performance Ninja -- False Sharing

Performance Ninja -- Link Time Optimizations

Performance Tuning Challenge #6 (wordcount)

🕊️Valera🕊️

«Легкий способ бросить курить»

Какая Маня милая 😍

Performance Ninja -- Replacing Branches with Lookup Tables

easyperf

Переглядів 772

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 жов 2024

КОМЕНТАРІ • 11

@easyperf3992 2 роки тому
Here is the complete code of the solution:
```
#include "solution.hpp"
int buckets[100] = {
// 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
};
static size_t mapToBucket(size_t v) {
if (v < (sizeof (buckets) / sizeof (int)))
return buckets[v];
return -1; // let it crash
}
std::array histogram(const std::vector &values) {
std::array retBuckets{0};
for (auto v : values) {
retBuckets[mapToBucket(v)]++;
}
return retBuckets;
}
```
@XCemaXX 9 місяців тому
Another version to fill an array:
template
struct LookUpTable {
constexpr LookUpTable() : arr() {
int i = 0;
unsigned int v = 0;
for (; i < N; ++i) {
v += (i == 13 || i == 29 || i == 41 || i == 53 || i == 71 || i == 83);
arr[i] = v;
}
}
std::array arr;
};
constexpr auto look_up_table = LookUpTable();
@jwbaker Рік тому ⁺¹
I am surprised these videos have only hundreds of views after years. I watched the whole series and I think they're great for practitioners at any level of experience. Thanks for making these.
@MaxAbramson3 2 роки тому ⁺³
You get dramatically better performance by using uint8[] for buckets because the whole lookup table would likely fit on the L1 cache. With about 100 entries, the uint8 would only be about 100 bytes. As int[], it would use up about 400 bytes, or four cache lines in many systems. In two-way set associative caches, this would use up an entire set, leading to both capacity misses and associativity misses, as well as adding a lot of burden to the memory system.
@easyperf3992 2 роки тому ⁺¹
Agree, it could be a further improvement.
@ennio5763 6 місяців тому
This is a bit of a contrived example.
In most cases, a branch cannot be replaced by a table lookup, because the impact of the branch is not limited to just generating a scalar value. It may do a different set of actions altogether.
@charactername263 13 днів тому
table lookups can return addresses of executable memory, or in higher level systems programming could return function pointers. In languages with coroutines it could return a coroutine to switch to
@ennio5763 13 днів тому
@@charactername263 branching on an executable address stored into an array is still branching: it's actually even worse than an `if` for the control flow of the cpu: it is even less likely to guess what's the correct next set of instructions, which is the whole point.
@charactername263 13 днів тому
@@ennio5763 there is no branch, that is the point of the lookup table. If a branch is not predictable, then the overhead of something like switching coroutine or doing a jump can be cheaper than the unpredictable branch. ymmv There's not really any point doing this in code that is not latency critical, and such codebases generally "warm" the hot path anyway.
@ennio5763 12 днів тому
@@charactername263 the problem is not the branch per say, but rather the unpredictability of the jump address. "table lookups can return addresses of executable memory" doesn't help, because it doesn't make the code flow more predictable.
@charactername263 12 днів тому
@@ennio5763 Yes you are right, the executable memory to jump to can be outside of the cache leading to a fault, each technique has downsides :)

Наступне

Автоматичне відтворення

Performance Ninja -- False Sharing

Performance Ninja -- False Sharing

Performance Ninja -- Link Time Optimizations

Performance Ninja -- Link Time Optimizations

Performance Tuning Challenge #6 (wordcount)

Performance Tuning Challenge #6 (wordcount)

«Легкий способ бросить курить»

«Легкий способ бросить курить»

Какая Маня милая 😍

Какая Маня милая 😍

Скабєєва ПЕРЕКОСИЛО! Пропагандист ВИХВАЛЯЄ ЗСУ #shоrts

Скабєєва ПЕРЕКОСИЛО! Пропагандист ВИХВАЛЯЄ ЗСУ #shоrts

Understanding and implementing a Hash Table (in C)

Understanding and implementing a Hash Table (in C)

Performance Ninja -- Dependency Chains 1

Performance Ninja -- Dependency Chains 1

Next.js 15 Breakdown (Everything You Need To Know)

Next.js 15 Breakdown (Everything You Need To Know)

Performance Ninja -- Compiler Intrinsics 2

Performance Ninja -- Compiler Intrinsics 2

Premature Optimization

Premature Optimization

Writing a CAD Language in Rust (with Adam Chalmers)

Writing a CAD Language in Rust (with Adam Chalmers)

Performance Ninja -- Loop Tiling 1

Performance Ninja -- Loop Tiling 1

How the stack got stacked

How the stack got stacked

Performance Ninja -- Vectorization 2

Performance Ninja -- Vectorization 2

Странная суперспособность вомбатов и новый тренд у шимпанзе

Странная суперспособность вомбатов и новый тренд у шимпанзе

Как не носить с собой вещи

Как не носить с собой вещи

Моя історія завмерлої вагітності...

Моя історія завмерлої вагітності...

«Легкий способ бросить курить»

«Легкий способ бросить курить»

Олександр Мацієвський з роду городових козаків. Ростислав Мартинюк у Інструкція.Смисл

Олександр Мацієвський з роду городових козаків. Ростислав Мартинюк у Інструкція.Смисл

"Ми дуже дякуємо цим хлопцям". Українські військові врятували двох жінок з лівого берега Дніпра

"Ми дуже дякуємо цим хлопцям". Українські військові врятували двох жінок з лівого берега Дніпра

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡