REUPLOAD: The Hidden Performance Price of C++ Virtual Functions - Ivica Bogosavljevic - CppCon 2022

Taking a Byte Out of C++ - Avoiding Punning by Starting Lifetimes - Robert Leahy - CppCon 2022

C++ Algorithmic Complexity, Data Locality, Parallelism, Compiler Optimizations, & Some Concurrency

Проверил Лайфхак ОГОНЬ-ТРЕНИЕМ Сахар+Марганцовка #фрост #shorts #frost #лайфхаки #лайфхак #выживание

Никогда не Спасай АДМИНА на Сервере и Вот Почему... #майнкрафт

Introduction to Hardware Efficiency in Cpp - Ivica Bogosavljevic - CppCon 2022

CppCon

Переглядів 8 008

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 вер 2024

КОМЕНТАРІ • 9

@basheyev Рік тому ⁺⁴
Great talk! Conclusion: be branch prediction, data prefetcher, vectorization & cache line friendly!
@topin8997 Рік тому ⁺²
I think that's a good introduction to get a general idea of fast code, which boils to to "keep your data compact, access it sequentially". As it is introduction, there was few mention of profiler tools without going into any details. Still, there _was_ performance tests that clearly shows why it's better. Still, two more thing worth mentioning: "reduce memory allocation/deallocation and conditional jumps wherever possible".
I can't find video here, but one guy said he reduced computation time of some train logistics simulation from days to hours by reusing some vectors. That's because for large vectors OS actually allocates it on first access to it, not all at once immediately. Just measure how much time it takes to create vector(100_000_000) and to std::fill it with next.
Next, which was only tangentially mentioned, is conditions and branch mispredictions. CPU actively predicts what branch of condition are most likely to be next and execute it in advance. That's why for loops are fast, they are likely to go on then be done. But, it branching is random, it fails constantly. Sometimes, code like r = a*(c>0) + b*(c 0 ? a : b;. Nowadays most compilers can vectorize this simple line, but may fail in some more complex cases, so keeping branching to a minimum is a good thing anyway.
EDIT: Check out Ivica's blog johnysswlab.com/author/ibogi/ for a lot more details on optimization. Looks great
@47Mortuus Рік тому ⁺¹
apart from the fact that "r = c > 0 ? a : b;" is often translated to machine code using branch free, 1 clock cycle conditional moves, for actual cases where a * (c > 0) is faster, PLEASE dear god PLEAAAAASE use a & -(c > 0) instead, as -0 is all 0 bits in 2s complement and -1 is all 1 bits in 2s complement. I just hate to see that 4 clock cycle latency 1 issue per cycle throughput integer multiplicaion when telling people about such micro optimizations you can even encapsulate in a meaningfully named forceinline function.
But again: Measure first and second look at and understand the compiled code in the assembly language of the platform you're targeting, as CMP + CMOVcc is faster that 2x ILP { CMP, SETcc, NEG, AND } ADD even, and most definitely faster that 2x ILP + 1 cycle overhead { CMP, SETcc, IMUL }, ADD...
Micro optimization requires much, MUCH more knowledge than one might assume at first - sometimes ADDING A BRANCH TO A SINGLE HARDWARE INSTRUCTION CAN BE FASTER, as with "uint32_t c = a / b" being slower than "uint32_t c = b > a ? 0 : b / a", depending on your data of course and even if it is poorly predicted by the hardware. This kind of micro optimization you mentioned is way further down the road than optimal memory layout, which is the most impactful optimization by a mile and pretty much the only topic in this talk and only requires knowledge of higher level languages such as C++, which is pretty much the only language used in this event. And since micro optimization is a way more advanced topic, it can often de-optimize code when done poorly and/or in a naive/misinformed manner, as illustrated in your comment (compilers cannot optimize your "r = a*(c>0) + b*(c
@DotcomL Рік тому ⁺²
A fantastic collection of tips here, thank you for the talk
@Azer_GG Рік тому ⁺¹
Thanks for the great talk!
@Luca-yy4zh Рік тому
Thanks for these useful tips
@player-eric Рік тому
Hi! Could you please provide accurate subtitles for this video?
@stavb9400 Рік тому
Does your matrix multiplication yield the same result when u switch the k and j loop ?
@mellowdv Рік тому
I tried it and on my CPU both ran for around 70 seconds, no real difference after swapping loops but I'm no performance expert.

Наступне

Автоматичне відтворення

REUPLOAD: The Hidden Performance Price of C++ Virtual Functions - Ivica Bogosavljevic - CppCon 2022

REUPLOAD: The Hidden Performance Price of C++ Virtual Functions - Ivica Bogosavljevic - CppCon 2022

Taking a Byte Out of C++ - Avoiding Punning by Starting Lifetimes - Robert Leahy - CppCon 2022

Taking a Byte Out of C++ - Avoiding Punning by Starting Lifetimes - Robert Leahy - CppCon 2022

C++ Algorithmic Complexity, Data Locality, Parallelism, Compiler Optimizations, & Some Concurrency

C++ Algorithmic Complexity, Data Locality, Parallelism, Compiler Optimizations, & Some Concurrency

Проверил Лайфхак ОГОНЬ-ТРЕНИЕМ Сахар+Марганцовка #фрост #shorts #frost #лайфхаки #лайфхак #выживание

Проверил Лайфхак ОГОНЬ-ТРЕНИЕМ Сахар+Марганцовка #фрост #shorts #frost #лайфхаки #лайфхак #выживание

Никогда не Спасай АДМИНА на Сервере и Вот Почему... #майнкрафт

Никогда не Спасай АДМИНА на Сервере и Вот Почему... #майнкрафт

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

Back to Basics: Cpp Value Semantics - Klaus Iglberger - CppCon 2022

Back to Basics: Cpp Value Semantics - Klaus Iglberger - CppCon 2022

What is SIMD? Abusing Vector Instructions Across Threads for Ray Tracing

What is SIMD? Abusing Vector Instructions Across Threads for Ray Tracing

Branchless Programming in C++ - Fedor Pikus - CppCon 2021

Branchless Programming in C++ - Fedor Pikus - CppCon 2021

BEST WAY to make Desktop Applications in C++

BEST WAY to make Desktop Applications in C++

Back to Basics: Templates in C++ - Nicolai Josuttis - CppCon 2022

Back to Basics: Templates in C++ - Nicolai Josuttis - CppCon 2022

Purging Undefined Behavior & Intel Assumptions in a Legacy C++ Codebase - Roth Michaels CppCon 2022

Purging Undefined Behavior & Intel Assumptions in a Legacy C++ Codebase - Roth Michaels CppCon 2022

-memory-safe C++ - Jim Radigan - CppCon 2022

-memory-safe C++ - Jim Radigan - CppCon 2022

The Hidden Performance Price of C++ Virtual Functions - Ivica Bogosavljevic - CppCon 2022

The Hidden Performance Price of C++ Virtual Functions - Ivica Bogosavljevic - CppCon 2022

An Introduction to Multithreading in C++20 - Anthony Williams - CppCon 2022

An Introduction to Multithreading in C++20 - Anthony Williams - CppCon 2022

Apple Event - September 9

Apple Event - September 9

Таня не врахувала уроки важкого дитинства і жила з тираном - Супермама 8 сезон - Випуск 1 | ПРЕМ'ЄРА

Таня не врахувала уроки важкого дитинства і жила з тираном – Супермама 8 сезон – Випуск 1 | ПРЕМ'ЄРА

ДНРівці та ЛНРівці найбільше знущалися над полоненими азовцями

ДНРівці та ЛНРівці найбільше знущалися над полоненими азовцями

Прощання з сімʼєю Базилевич у Льовові

Прощання з сімʼєю Базилевич у Льовові

БЕРЕМЕННА В 16 ► Репер АЛЬФОНС и мама АЛКАШКА

БЕРЕМЕННА В 16 ► Репер АЛЬФОНС и мама АЛКАШКА

ДИЗЕЛЬ ШОУ 2024 💙 148 ВИПУСК 💛💐 ВЕЛИКА ПРЕМ'ЄРА 🌷 від 06.09.2024

ДИЗЕЛЬ ШОУ 2024 💙 148 ВИПУСК 💛💐 ВЕЛИКА ПРЕМ'ЄРА 🌷 від 06.09.2024

А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts

А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts

Russian soldier catches Ukraine FPV drone with his bare hands and runs with it

Russian soldier catches Ukraine FPV drone with his bare hands and runs with it