A Lock-free Atomic shared_ptr - Timur Doumler - CppNow 2022

Поділитися
Вставка
  • Опубліковано 12 вер 2024
  • #Boost​ #Cpp​ #CppNow​
    Slides: github.com/boo...
    CppNow Website: www.cppnow.org​
    CppNow Twitter: @CppNow​
    ---
    A Lock-free Atomic shared_ptr - Timur Doumler - CppNow 2022
    ---
    std::shared_ptr is a standard smart pointer utility widely used in modern C++. A commonly overlooked property of std::shared_ptr is that while its control block is thread-safe, the shared_ptr object itself is not. To overcome this limitation, C++20 introduced std::atomic std::shared_ptr.
    A lock-free implementation of std::atomic std::shared_ptr would be immensely useful for sharing objects between threads in low-latency and real-time applications and as a building block for lock-free data structures. Unfortunately, all existing standard library implementations are not lock-free, rendering them useless for these use cases. Why is it so hard to provide a lock-free atomic shared_ptr, and what would it take to implement it?
    In this talk, we discuss the motivation and use case, describe how std::shared_ptr works, and review the history of standardising std::atomic std::shared_ptr. We then look at several existing non-standard implementations. We discuss the different implementation strategies, their tradeoffs and limitations, and the underlying platform-specific mechanisms (32 vs. 64 bit, Intel vs. ARM). Finally, we present a new, free and open-source implementation of a lock-free atomic shared_ptr portable across these different platforms.
    ---
    Timur Doumler
    Timur Doumler is C++ Developer Advocate at JetBrains and an active member of the ISO C++ standard committee. As a developer, he worked many years in the audio and music technology industry and co-founded the music tech startup Cradle. Timur is passionate about building inclusive communities, clean code, good tools, low latency, and the evolution of the C++ language.
    ---
    Videos Filmed & Edited By Bash Films bashfilms.com/
    UA-cam Channel Managed By Digital Medium Ltd: events.digital...

КОМЕНТАРІ • 14

  • @Egory444
    @Egory444 2 роки тому +10

    What a great talk! Thanks for digging so deeply Timur

    • @BoostCon
      @BoostCon  2 роки тому

      Glad you enjoyed it!

    • @OlliS71
      @OlliS71 22 дні тому

      @@BoostCon But it has a big flaw: You couldn't store the pointer along with the seqence counter and the weak reference counter as thee distinct atomics. You'd have to store them all as a large DWCAS-atomic. Otherwise you'd increment the weak reference count belonging to a certain sequence count and pointer and the pointer and the sequence count might be change meanwhile so that the new weak reference count indictates for the wrong pointer. There's no way to do it really safe than with a DWCAS. And a DWCAS is only slightly slower than a single atomic increment since the most time is consumed by fetching the cacheline from a remote cache.

  • @tigranishkhanov9521
    @tigranishkhanov9521 Рік тому

    My understanding of N-way split refcount is you need it if there are N - 1 independent ways to access shared variable you want to protect. For example, for stack there is only one way (through atomic head) so you split count into 2 parts. For queue, there are two: from previous node and from atomic tail so you need 3-way split refcount. In general if you protect data in the graph with K inbound edges (edge=access) you need K counts at each edge and 1 count at the node itself, hence K + 1 total of which only last one needs to be atomic.

  • @grigorij81
    @grigorij81 2 роки тому +3

    There yet another feature of aliasing constructor of shared_ptr worth taking it into account - ability to work with empty control_block: shared_ptr(shared_ptr(), &g_int) will hold a pointer to g_int but control block will have null value

    • @timurdoumler4813
      @timurdoumler4813 2 роки тому +2

      Interesting! What would be a use case for that?

    • @khanyusufathar7133
      @khanyusufathar7133 Рік тому +1

      @@timurdoumler4813 lets take a case where a function takes only shared-ptr but caller has the object on stack or non-share-pointer object, we can create alias object with this and pass to the function.

    • @samphunter
      @samphunter 9 місяців тому

      ​​@@khanyusufathar7133this is really clever. I always put a noop lambda as custom destructor for this use case but this is a lot better.

  • @kjpsdbg
    @kjpsdbg Рік тому +1

    Could the ref_count be implemented with a bit pattern instead? So each shared pointer reserve a different bit in the ref_count. Drawback is of course the number of bits. So you can't have more shared pointer to the same object, than the number of bits.

  • @khanyusufathar7133
    @khanyusufathar7133 Рік тому

    regarding race condition at 5:41, ptr is taken by reference in another thread? i guess yes otherwise there wont be a race condition?

  • @OlliS71
    @OlliS71 22 дні тому

    Incrementing the reference counter alone doesn't work. You have to CAS all three components: the pointer, the sequence-counter and the weak reference count. If you increment the weak reference counter alone the pointer or sequence counter might change meanwhile, meaning that the werak reference counter would now relate to a new pointer or the same pointer with na new assignment under a new sequence number.
    There's no way to do it better as with a 16 byte CAS: first increment the weak reference counter and memorize the pointer and the sequence counter, then incremnt the strong reference counter inside the object and then decrement the weak reference counter as long as the pointer and the sequence counter haven't changed. Each new pointer assigmnent gets a new sequence counter. That all seems somewhat heavyweight. I implemented it and its always somewhat faster than the futex'd solution of MSVC and magnitudes faster than the code of libstdc++ and libc++.
    And there's a simple optimization that would make even futex'd solutions magnitudes faster than the DWCAS-code for RCU-like patterns: Introduce a copy-assignment--assignment operator for shared_ptr with an atomic as a parameter. The operator should check if the shared_ptr and the atomic point to the same value before locking the latter and if they're the same do nothing. With RCU-like patterns the central atomic isn't updated often so that the comparison acts on purely shared cacheline in a few cycles.