Parallel C++: Double Buffering

Parallel C++: SIMD Intrinsics

C++ Crash Course: False Sharing in C++

Самое неинтересное видео

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ПРЕМ'ЄРА 🇺🇦 ВИПУСК 147 на підтримку ЗСУ ⭐ Гумор ICTV від 30.08.2024

Parallel C++: False Sharing

CoffeeBeforeArch

Переглядів 4 131

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 11 вер 2024
In this video we at the basics of false sharing!
For code samples: github.com/coff...
For live content: / coffeebeforearch

КОМЕНТАРІ • 19

@simonepernice8059 Рік тому ⁺⁷
That is the most enlightening lesson on cache related performance effect that I have ever had! Thank you very much Nick! Cheers, Simone
@CoffeeBeforeArch Рік тому ⁺¹
Glad you enjoyed it! :^)
@yihan4835 Рік тому ⁺³
Currently trying to break into the HFT space, your video is super helpful!
@Gauthamphongalkar 10 місяців тому ⁺¹
In every tutorial people talk theory, but you show it practically which makes you special!.. thank you! BIG FAN :)
@giusepperana6354 Рік тому ⁺¹
That's incredible how much of a difference it makes. And explained very well and clearly. Thanks for direct sharing, fantastic video.
@CoffeeBeforeArch Рік тому
Glad you enjoyed it :^)
@tenko3211 9 місяців тому
In the false_sharing example, since each thead has its own counter, there is no need to use std::array for the counters. I found that using std::array makes the program much faster.
@ychevall 4 місяці тому
Do we still need -lpthread in modern g++? I thought gcc is now smart enough to include the library by default
@thisisolorin394 Рік тому ⁺¹
what would happen if an array of int was used instead of atomic? Would the other elements of the array be invalidated as well if they are not atomic?
@CoffeeBeforeArch Рік тому
Yep - invalidations occurs whether or not you use atomic. If a core wants to write to an integer on a cache line that currently is in another core’s cache, it has to invalidate that cache line and get exclusive access to it.
Atomic makes sure that the entire read+modify+write operation occurs as one atomic unit (I.e., it can’t be broken up). Atomic is more of a specifier for how actions are performed on a piece of memory (not really the details of what that memory looks like). An atomic int and and a regular int are just 4B of memory. With an atomic int though, things like increment result in atomic instructions (eg, and increment with the lock prefix in x86), while an increment of a normal integer will just result in something like an increment instruction (without a lock prefix).
@thisisolorin394 Рік тому
@@CoffeeBeforeArch hm, i tried it on my machine and got basically no cache misses when running perf c2c and the program ran in 0.x sec or so like in the case with alignas. Any thoughts?
@CoffeeBeforeArch Рік тому ⁺¹
@@thisisolorin394 If the program completed almost instantly, there's a good chance that the compiler just optimized away the operation. Compiler optimizers are pretty clever about getting rid of code that generates values that are not used later on in the program (things like atomic prevent that from happening).
@nareknazari6443 Рік тому
hello . what is your compailer?
@mtusharx Рік тому
gnu c++, version I couldn't see.
@empireempire3545 Рік тому
isnt this alignas a bit of a dirty hack? ; ) i mean, you're using up way more memory than you need to - in this case it doesnt matter, but what to do in cases when it does? Another thing - what if your code is supposed to work on multiple different types of processors and there simply ISNT a single size of cacheline?
I wonder if this problem can be solved in a different way somehow.
@CoffeeBeforeArch Рік тому ⁺³
It's a pretty standard technique for solving these kinds of problems. A dummy array of padding bytes is also pretty standard, and is shown as a solution in a recent performance blog post by Netlifx - netflixtechblog.com/seeing-through-hardware-counters-a-journey-to-threefold-performance-increase-2721924a2822
There usually is a space-speed trade-off in performance (e.g., double buffering doubles the amount of memory allocated to buffers). Padding to avoid false sharing is generally small, and not a major overhead (in terms of overall memory capacity). You only need the padding at the edges of where a problem is being partitioned (and is at most a cache line of padding at each boundary which is, in general, only 64B each). If you are in some super memory constrained embedded system, your only real option would be to move around data members that are being written to so that they are naturally spaced apart (or find a way to limit the number you're doing in the first place).
Software optimizations are not always portable - you're inherently tuning to the underlying architecture your code is running on. If you have a different cache line size, you might have to tune it differently. You could just have the spacing controlled by something like a preprocessor macro that is set at compile-time based on the cache line size of the target architecture.
@matheusmarchetti628 Рік тому
If the counter variable was local to the lambda, it should also avoid false sharing. Scott Meyers showed a similar example in his presentation about caches
@killacrad 7 місяців тому
@@matheusmarchetti628 that's a totally different use case with counter variable being thread local, performance penalty from cache coherence does not apply.
@deverloperfantom1372 Рік тому
hi wi the wery smole lesson dot`t trasted infotmasion bu the lesson i`m stat goot see serealizm

Наступне

Автоматичне відтворення

Parallel C++: Double Buffering

Parallel C++: Double Buffering

Parallel C++: SIMD Intrinsics

Parallel C++: SIMD Intrinsics

C++ Crash Course: False Sharing in C++

C++ Crash Course: False Sharing in C++

Самое неинтересное видео

Самое неинтересное видео

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ПРЕМ'ЄРА 🇺🇦 ВИПУСК 147 на підтримку ЗСУ ⭐ Гумор ICTV від 30.08.2024

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ПРЕМ'ЄРА 🇺🇦 ВИПУСК 147 на підтримку ЗСУ ⭐ Гумор ICTV від 30.08.2024

Новий концерт Єдиного Кварталу від 1 вересня 2024. Повний випуск

Новий концерт Єдиного Кварталу від 1 вересня 2024. Повний випуск

False Sharing in Java

False Sharing in Java

Wait... JSON??... in C?

Wait... JSON??... in C?

Parallel C++: Lock-Free and Wait-Free Algorithms

Parallel C++: Lock-Free and Wait-Free Algorithms

Arenas, strings and Scuffed Templates in C

Arenas, strings and Scuffed Templates in C

Cache Issues -- False Sharing -- Mike Bailey, Oregon State University

Cache Issues -- False Sharing -- Mike Bailey, Oregon State University

Parallel C++: MPI

Parallel C++: MPI

What is the Smallest Possible .EXE?

What is the Smallest Possible .EXE?

Parallel C++: OpenMP Target Offloading

Parallel C++: OpenMP Target Offloading

False Sharing - A Phenomenon on Modern CPUs

False Sharing - A Phenomenon on Modern CPUs

Каха отправляет дочь в школу #непосредственнокаха

Каха отправляет дочь в школу #непосредственнокаха

GTA 5 vs GTA San Andreas Doctors🥼🚑

GTA 5 vs GTA San Andreas Doctors🥼🚑

大家都拉出了什么#小丑 #shorts

大家都拉出了什么#小丑 #shorts

БЕЛКА РОЖАЕТ?#cat

БЕЛКА РОЖАЕТ?#cat

Дмитрук звільнить Татарова? Правоохоронна система пробила дно

Дмитрук звільнить Татарова? Правоохоронна система пробила дно

Таня не врахувала уроки важкого дитинства і жила з тираном - Супермама 8 сезон - Випуск 1 | ПРЕМ'ЄРА

Таня не врахувала уроки важкого дитинства і жила з тираном – Супермама 8 сезон – Випуск 1 | ПРЕМ'ЄРА

«А ми під Україну підемо?»: жителька Курщини #україна #війна #зсу #курск

«А ми під Україну підемо?»: жителька Курщини #україна #війна #зсу #курск

Прохожу маску ЭМОЦИИ🙀 #юмор

Прохожу маску ЭМОЦИИ🙀 #юмор