[FOWM'24] Why Languages Should Preserve Load-Store Order

CppCon 2016: Hans Boehm “Using weakly ordered C++ atomics correctly"

[PLMW@PLDI24] A Tour of Program Optimization

Этот боец РАЗНЁС в одиночку ТРОИХ соперников всего за ТРИ минуты #shorts

До конца😂

Ильдар Авто-Подбор VS Мастерская Синдиката! Ссора длиною в 5 лет | САМОЕ ЛЕГЕНДАРНОЕ ОЖИВЛЕНИЕ

[FOWM'24] What we learned from C++ atomics and memory model standardization

ACM SIGPLAN

Переглядів 2 400

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 лип 2024
[FOWM'24] What we learned from C++ atomics and memory model standardization
Hans-J. Boehm
The C++11 memory model was first included with thread support in C++11, and then incrementally updated with later revisions. I plan to summarize what I learned, both as a C++ standards committee member, and more recently as a frequent user of this model, mentioning as many of these as I have time for:
The C++ committee began with a view that higher level synchronization facilities like mutexes and barriers should constitute perhaps 90% of thread synchronization, sequentially consistent atomics, maybe another 9%, and weakly ordered atomics the other 1%. What I’ve observed in C++ code is often very far from that. I see roughly as much atomics as mutex use, in spite of some official encouragement to the contrary. Much of that uses weakly ordered atomics. I see essentially no clever lock-free data structures, along the lines of lock-free linked lists in the code I work with. I do see a lot of atomic flags, counters, fixed-size caches implemented with atomics, and the like. Code bases vary, but I think this is not atypical.
In spite of their frequent use, the pay-off from weakly ordered atomics is decreasing, and is much less than it was in Pentium 4 times. The perceived benefit on most modern mainstream CPUs seems to significantly exceed the actual benefit, though probably not so on GPUs. In my mind this casts a bit of doubt on the need to expose dependency-based ordering, as in the unsuccessful memory_order_consume, to the programmer, in spite of an abundance of use cases. Even memory_order_seq_cst is often not significantly slower. I’ll illustrate with a microbenchmark.
We initially knew way too little about implementability on various architectures. This came back to bite us recently [Lahav et al.] This remains scary in places. Hardware constraints forced us into a change that makes the interaction between acquire/release and seq_cst hard to explain, and far less intuitive than I would like. It seems to be generally believed that this is hard or impossible to avoid with very high levels of concurrency, as with GPUs.
We knew at the start that the out-of-thin-air problem would be an issue. We initially tried to side-step it, which was a worse disaster than the current hand-waving. This has not stopped memory_order_relaxed from being widely used. Practical code seems to work, but it is not provably correct given the C++ spec, and I will argue that the line between this and non-working code will inherently remain too fuzzy for working programmers. [P1217]
Unsurprisingly, programmers very rarely read the memory model in the standard. We learned that commonly compiler writers do not either. The real audience for language memory models mostly consists of researchers who generate instruction mapping tables for particular architectures. The translation from a mathematical model to standardese is both error prone, and largely pointless. We need to find a way to avoid the standardese.
Atomics mappings are part of the platform application binary interface, and need to be standardized. They often include arbitrary conventions that need to be consistently followed by all compilers on a system for all programming languages. Later evolution of these conventions is not always practical. I’ll give a recent RISC-V example of such a problem.
Наука та технологія

КОМЕНТАРІ • 1

@harold2718 4 місяці тому ⁺¹
Showing benchmarks for SC *loads* is somewhat misleading as all of the cost for sequential consistency is in SC *stores* (especially on x86). (with the normal mapping anyway, the alternative mapping with expensive SC loads and cheap stores is theoretically possible)

Наступне

Автоматичне відтворення

[FOWM'24] Why Languages Should Preserve Load-Store Order

[FOWM'24] Why Languages Should Preserve Load-Store Order

CppCon 2016: Hans Boehm “Using weakly ordered C++ atomics correctly"

CppCon 2016: Hans Boehm “Using weakly ordered C++ atomics correctly"

[PLMW@PLDI24] A Tour of Program Optimization

[PLMW@PLDI24] A Tour of Program Optimization

Этот боец РАЗНЁС в одиночку ТРОИХ соперников всего за ТРИ минуты #shorts

Этот боец РАЗНЁС в одиночку ТРОИХ соперников всего за ТРИ минуты #shorts

Ильдар Авто-Подбор VS Мастерская Синдиката! Ссора длиною в 5 лет | САМОЕ ЛЕГЕНДАРНОЕ ОЖИВЛЕНИЕ

Ильдар Авто-Подбор VS Мастерская Синдиката! Ссора длиною в 5 лет | САМОЕ ЛЕГЕНДАРНОЕ ОЖИВЛЕНИЕ

КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂

КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂

Factory Method Pattern - Design Patterns (ep 4)

Factory Method Pattern – Design Patterns (ep 4)

Unveiling the Invisible Detection and Evaluation of Prototype Pollution Gadgets with Dynamic Tai...

Unveiling the Invisible Detection and Evaluation of Prototype Pollution Gadgets with Dynamic Tai...

An Overview of Arrays and Memory (Data Structures & Algorithms #2)

An Overview of Arrays and Memory (Data Structures & Algorithms #2)

4 - Game planning (mostly skipped)

4 - Game planning (mostly skipped)

GTC 2022 - How CUDA Programming Works - Stephen Jones, CUDA Architect, NVIDIA

GTC 2022 - How CUDA Programming Works - Stephen Jones, CUDA Architect, NVIDIA

PYTHON Mock Interview | One Of the BEST Mock Interview With PROGRAM Solving | Data Science | AI

PYTHON Mock Interview | One Of the BEST Mock Interview With PROGRAM Solving | Data Science | AI

Minimax: How Computers Play Games

Minimax: How Computers Play Games

Dynamo - Why Amazon Ditched SQL | Distributed Systems Deep Dives With Ex-Google SWE

Dynamo - Why Amazon Ditched SQL | Distributed Systems Deep Dives With Ex-Google SWE

Create a ROS2 Python Package - ROS2 Tutorial 4

Create a ROS2 Python Package - ROS2 Tutorial 4

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

Новые iPhone 16 и 16 Pro Max

Новые iPhone 16 и 16 Pro Max

Os fabricantes de Carregadores 🔌 esconderam isso de você a vida toda #shorts #viral #celular

Os fabricantes de Carregadores 🔌 esconderam isso de você a vida toda #shorts #viral #celular

Выбрать смартфон стало почти невозможно!❗ #смартфон #android #техника #технологии #техноблог

Выбрать смартфон стало почти невозможно!❗ #смартфон #android #техника #технологии #техноблог

Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp

Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp

New setup part 3: There's still a lot to add #setup #gamer #gameroom #techhouse #gamingtech

New setup part 3: There's still a lot to add #setup #gamer #gameroom #techhouse #gamingtech

КОМП В МЕШКЕ / КУПИЛ В ДНС ПК ЗА 50К ОТ MSI. ВСТРОЙКА ФОРЕВЕР?

КОМП В МЕШКЕ / КУПИЛ В ДНС ПК ЗА 50К ОТ MSI. ВСТРОЙКА ФОРЕВЕР?

iPhone 16 - НАРЕШТІ ДОЧЕКАЛИСЯ!

iPhone 16 – НАРЕШТІ ДОЧЕКАЛИСЯ!