Crust of Rust: Atomics and Memory Ordering

Jon Gjengset

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 лип 2024
In this episode of Crust of Rust, we go over Rust's atomic types, including the mysterious Ordering enum. In particular, we explore the std::sync::atomic module, and look at how its components can be used to implement concurrency primitives like mutexes. We also investigate some of the gotchas and sometimes counter-intuitive behaviors of the atomic primitives with different memory orderings, as well as strategies for testing for and debugging errors in concurrent code.
0:00:00 Introduction
0:02:03 What are atomics?
0:05:26 The Memory Model
0:07:33 AtomicUsize
0:12:23 Questions so far
0:20:20 Implementing a (bad) Mutex
0:27:39 Our Mutex works!
0:33:04 Pesky thread interleavings
0:39:42 compare_exchange
0:44:54 Mitigating compare_exchange contention
0:50:43 compare_exchange_weak
0:57:02 Ordering::Relaxed
1:12:13 Ordering::Acquire/Release
1:26:00 The fetch_ methods
1:34:07 Ordering::SeqCst
2:00:08 Breather
2:00:40 ThreadSanitizer
2:05:49 loom
2:22:09 Atomic fences
2:27:27 volatile
2:32:18 AtomicPtr
2:35:13 Atomics through FFI
2:36:44 Consume ordering?
2:38:08 Closing thoughts
You can read more about Rust's memory model in the Rust reference at doc.rust-lang.org/nightly/ref... and in the C++ memory model documentation at en.cppreference.com/w/cpp/ato....
Live version with chat: • Crust of Rust: Atomics...
Наука та технологія

КОМЕНТАРІ • 127

@haystackdmilith 3 роки тому ⁺¹⁴⁴
"Don't use spinlocks! Don't implement spinlocks! Ok, let's implement a spinlock". Love it!
@sycration 3 роки тому ⁺¹
@Cooper Rene LOOL just use a private tracker, don't be a punk
@molozful 2 роки тому ⁺³⁶
It's incredible how coherent you can be in a live session, and on such a tricky topic. Kudos!
@KohuGaly 3 роки тому ⁺¹⁰⁷
This video exposes one concept that you rarely notice in regular code. Computer code looks sequential, when written down as text, but in actuality it defines an oriented graph of operations. If two operations don't lie on the same path through the edges of the graph, their relative order is undefined.
Multi-threaded memory access exposes this weird quirk, because it itself may lack inherent ordering (ie. allows for race conditions).
Memory ordering introduces extra edges to the graph, to restrict possible orderings.
@andynicholson7944 3 роки тому ⁺¹¹
🤯 that’s a really clear explanation. I’d never thought of code in those terms. Thanks for the perspective shift!
@michalkupczyk7 3 роки тому ⁺⁴
You may want to mentally map Ordering to Transaction Isolation levels in SQL databases.
@carterthaxton 2 роки тому ⁺⁵
Another important ingredient is out-of-order execution by modern processors. Normally such CPUs and their caches provide a programmer-visible model that appears to execute sequentially. When they run on multiple threads, especially on symmetric multi-processors (e.g. a quad-core CPU), then these out-of-order execution models can expose far more interdependence than you might think could occur just due to "parallel execution". Ever since the Pentium Pro, for example, your CPU isn't even executing the raw instructions. It's converting them into micro-code, which is analyzed in real-time, much like a modern compiler, and reordered according to the directed graph of operations you describe. Then as many of those micro-code instructions as possible are executed simultaneously. Add to that the presence of multi-layer caches, shared between multiple processors, and the whole idea of instructions that execute one instruction at a time just completely breaks down.
@JordanBourdeau 4 місяці тому
This is a great explanation, thanks tor sharing!
@Axman6 9 днів тому
This is related to how Haskell's IO type is defined (in GHC) - it is a state monad which passes around a special value (RealWorld#) which the compiler treats as being different every time it sees it: data IO a = IO (RealWorld# -> (a, RealWorld#)) - this creates a strict ordering between IO actions, the compiler can't change them around because the RealWorld# token would no longer be passed between the actions in the same order. Other Haskell code is exactly what you've said, a graph of values, but it's one in which the order doesn't matter, because the values are always pure.
(I'll note that "ordering of actions" didn't require the use of the m-word, that's what we call the ubiquitous pattern that IO fits nicely into)
@kennichdendenn 3 роки тому ⁺⁷⁵
I want to give feedback:
This is great. It fits my difficulty level perfectly and is exactly what I need - as the 2018 guideline suggests.
Im have some experience with rust and am a computer science student currently just before my bachelors degree, so I have some experience with programming and how a computer works on several levels. Heck, I wrote a small subset-of-c-compiler. But I i.e. never heard of Ordering before.
Those topics normally have a super high barrier of entry because of all the previous knowledge required, only that that knowledge has the same barrier. Your videos help break that and to learn a lot about how computers work in general.
Keep on with the great work. I wish all my lectures were even half as engaging, informative, accessible and interesting.
@longtomjr22 3 роки тому ⁺¹⁵
My wife made me watch the intro 7 times. Makes us laugh every time! (Now if only I could also get her as excited about coding again)
@aqua3418 2 роки тому
Just have to come at it from an angle that's applicable to her in some way, that way she can see the possibilities and how it benefits/helps her :D
@ericseppanen6579 3 роки тому ⁺⁷
One of the exciting aspects of fetch_update (or any CAS loop that generates the new value from the old) is that your closure may be executed more than once. If your closure has side-effects, it sometimes surprises people that they get the side-effect multiple times.
A more common mistake is for the closure to read and write to a value that exists outside the closure (i.e. "if !initialized { initialized = true; new_value = 42 }"). If the first CAS attempt fails, then the second run of the closure generates an unexpected result because it inherits state from the first run of the closure. I've seen this bug multiple times in production code.
@BrazenNL 2 роки тому ⁺¹
7:00 I couldn't live without Dark Reader!
@kanuos Рік тому ⁺²
Could not have asked for anything better than this! Thank you so much for the work you do for us!
@shashanksharma21 Рік тому ⁺¹
Thank you for explaining this in such a brilliant manner! This is the first time I have been able to understand this ordering issue. Thank you Jon!
@RobertWHurst 10 місяців тому ⁺¹
Thank you for creating so many of these awesome videos Jon. I find them very beneficial personally, and always enjoy the adventures you take us on. Cheers 🍻
@MikeBondar 3 роки тому ⁺¹
great video, thanks! this whole memory ordering thing is much clearer now
@zamalek4079 3 роки тому ⁺¹⁴
Jeff Preshing has an incredible blog (and I think some videos) on this. He's a C developer, but as you said that doesn't matter.
@Ergzay 3 роки тому ⁺⁵
For anyone testing with themselves while following along: While testing with the first section with the "bad" mutex. It reliably generated completely random numbers every run when run with just "cargo run" instead of "cargo run --release", even without the yield. However I'm running on a Macbook Air with the M1 based ARM processor so that probably increases the randomness as ARM's memory model is a little more relaxed than Intel's I believe.
Also, with everything on Relaxed, I still got reliable failures even after switching to the "correct" implementation using compare_exchange. The value was very close to being correct and passed most of the time however.
@nilshaberstroh2705 2 роки тому ⁺³
Another topic I might have avoided for much longer. But you manage to explain very complicated things (at least to me) in a very digestible way. Hat off 🎩. And the MESI protocol is indeed an interesting topic. Thanks for that hint as well.
@dmitrysim 3 місяці тому
First of all it's a great video, Thank you.
I though it might be a good idea to think on the problem in terms of cpu cores, rather than threads.
Later x.load -> c1 denotes "x.load was scheduled to run on core 1".
2nd problem (Ordering::SeqCst section):
- (x.store -> c1, y.store -> c2) finish.
*** At this point c1: {x: true, y: false} c2: {x: false, y: true}
- (!x.load -> c1, !y.load -> c2), so both loops end;
- (x.load -> c2, y.load -> c1)
This way z == 0.
1st problem (Ordering::Relaxed section):
To achieve the case, we need at least "x.load finishes after y.store finishes", since y.store is the only way 42 can get into any core memory.
So this only achievable with compiler shenanigans, in this case on 1 core:
- y.store -> c1
*** At this point c1:{y: 42}
- y.load -> c1
- x.store -> c1
*** At this point c1:{x: 42}
- x.load -> c1
Also, loom docs docs.rs/loom/latest/loom/#relaxed-memory-ordering state that "B can run before A", not "D can run before C", meaning "compiler can shuffle commands even if the 2nd depends on result of the 1st"? - I hope it's just a typo.
@elio8119 3 роки тому ⁺¹
I think this is the best explanation of how to use atomics.
@robert36902 2 місяці тому
Thanks a lot for this video! I feel it's a bit of a low-key hardcore topic that is great to know, even though simple mutexes and atomic counters have been enough for me so far :)
@dinushkam2444 3 роки тому
Great video, learned alot. Thanks I might need to rewatch
@xinhao6245 3 роки тому
great video to explain the details, thanks
@DHorse Рік тому
Great topic!
@amcsi 3 роки тому ⁺³
Thank you so much for diving so deep into all the memory orderings, and all of everything regarding atomics! I watched the entire video.
I was going to ask why they say Rust gives you safe multithreading if there are so many issues with it apparently, but then at the very end you basically said that it's just atomics that are the problem; mutexes are what really give you the safety :)
@berylliosis5250 2 роки тому
The other big difference is that if you don't use Atomics or unsafe, you can't mess up. C/C++ will happily pass a T* to another thread and let you get all the problems atomics have, and more. In Rust, code that doesn't use unsafe or a preexisting synchronization primitive (which necessarily uses unsafe) can't even experience a race condition (afaik).
@brdrnda3805 10 місяців тому
@@berylliosis5250 Correction: ...can't even experience a -race condition- data race.
From the Rustonomicon:
"Safe Rust guarantees an absence of data races,..." "...However Rust does not prevent general race conditions."
@oerglwoergl 3 роки тому ⁺⁷
Very interesting, thanks. The general rule seems to be (as with crypto): don't roll your own (if you can avoid it).
@digama0 3 роки тому ⁺⁵
Regarding your example of relaxed ordering, there is a variant that is even more mind bending:
thread 1: { let r1 = y; x = r1; r1 }
thread 2: { let r2 = x; if r2 == 42 { y = 42; } r2 }
(All loads of x and y are relaxed.) The crazy thing is that it is _still_ allowed that both threads read 42. This seems impossible since there is a control dependency from the read of x to the write of y, but this can actually happen in practice if thread 2 speculatively executes y = 42, thread 1 picks that up and stores x = 42, and then thread 2 receives 42 from x and decides that the speculation was correct. In fact, relaxed loads are even worse than that, since in the following program
thread 1: { let r1 = y; x = r1; r1 }
thread 2: { let r2 = x; y = r2; r2 }
it is _still_ allowed for r1 and r2 to be 42, which is now completely nuts since there aren't any 42's in the program. (Actually C11 has a clause saying there has to be a 42 somewhere in the program but you can just have a random 42 elsewhere to satisfy this constraint.) However this is an unrealistic execution, in that no hardware actually can produce 42 in this situation without literal time travel. (This is also known as the "out-of-thin-air" problem since 42 just appears by time travel shenanigans.)
@susejx1 3 роки тому
On that reference for "in the air" of this one can there be a transformative way to stay in a better position where the experience isn't plugged into a two
@iilugs Рік тому
I was about to lose hope on understanding atomics... then I saw there's a Jon Gjengset video about it 😍
@casperes0912 Рік тому
For a master’s course I implemented exhaustive and optimal enumeration of value release-acquire traces using an algorithm called DPOR. Fun stuff.
@saseoham 3 роки тому
Thank you!!!!!
@amirh6712 2 роки тому ⁺¹
Thank you so much for the great video Jon
Just wanna point something out that I think you overlooked a little bit
There is also another guarantee with Relaxed Ordering, and that is the operation is guaranteed to happen at some point in the future
For example, without using any ordering, these two codes are considered equivalent from the compiler perspective (This optimization is known as Hoisting):
while (!stoprequested){
}
if (!stoprequested){
while(true){
}
}
However if you set and read the stoprequested variable with relaxed ordering, it is guaranteed that the first loop will break eventually, meaning that the compiler will not translate it to the second form.
@jonhoo 2 роки тому ⁺¹
I actually looked through the spec for text along these lines, assuming that it _must_ be in there, and couldn't find it. Can you point me to where you see it written? The best I found was something along the lines of "the value is eventually visible" or something to that effect, but with no promises about what "eventually" means.
@timvw01 2 роки тому
This is great
@timvw01 Рік тому
Learning a lot here. Still cant accurately predict the results when using ordering in certain cases, but this video explains the concepts very well. Perhaps some sketching of the concepts can help, as was done in the leftright/evmap vid.
Thanks!
@thomasfrans1185 Рік тому
32:48 is probably the most sarcastic "Oh nice, it works so well" I've ever heard XD.
@alliu6757 3 роки тому ⁺³
Hi, Jon. Thanks for the amazing tutorial. Could you do some videos about tokio, crossbeam, and rayon?
@lettere9560 2 роки тому
The way I understand it is that load/store instruction only affects the local cache of a particular core. Ordering::Acquire is an analogue to recv and Ordering::Release is an analogue to send operation. However, there is some additional rule where if there is no incoming message in the bus, then Ordering::Acquire just read directly on its local cache.
@jocketf3083 2 роки тому ⁺³
Thank You! I learned a lot from this video! However, I almost feel like parts of this API should require unsafe as an "I know what I'm doing" kind of thing. I was thinking that SeqCst could be the "safe" default and that methods where you can specify orderings would be unsafe. Perhaps that is taking it a bit far since this isn't really about memory safety, but it might go a little way towards fearless concurrency. Do you have any thoughts on that?
@marlls1989 2 роки тому
On the counter example, wouldn’t relaxed allow the counter to be updated to the same value twice? Like, both load a previous value and both add 1 to the same old value.
@droidgod3 3 роки тому
Is it possible to get the code for this episode as a gist or git repo?
Sometimes it is hard to follow and I have to rewind all the way cause i forgot the context, when switching between examples.
@TheClonerx 3 роки тому
Even for C++ this is really useful
@yailkalaf8542 Місяць тому
Thanks
@daniellambert6207 3 роки тому
2:39:17 always an educational experience :)
@Baptistetriple0 Рік тому
Ok I have a question on the possible reordering of the programm by the CPU when the operations are independant, if I print something in my function, but the printed value don't depend on any operation of the function, like if I just put "println!("test");", is the compiler allowed to move that print in the function? or because IO is considered having side effect it can't assume there is no link with the other operation ?
@BryanChance 2 роки тому
This is new to me. My mind blew up after 1:58:00. LOL
@fbytgeek 2 роки тому
@jon Gjengset, is it possible to do a stream on Rayon or parallelism in Rust?
@mindstorm3854 3 роки тому ⁺³
omg I will need to see this video 5 times to understand half
@ahuggingsam Рік тому
so I know I'm late to the party but I do have a question about the Orderding::SeqCst example. Does this amount of synchronization mean we end up regressing to almost single threaded performance again? Or can the compiler/CPU ensure this ordering while still maintaining concurrency (with obviously at least some overhead for the synchronization) ?
@nikosefthiasjs 3 роки тому ⁺²
Hi thanks for the great video.
What lsp plugin are you using it is really responsive
@jonhoo 3 роки тому ⁺⁴
I'm using rust-analyzer through coc.nvim :)
@adamzimmerman1 3 роки тому ⁺¹
Gonna leave a plug for "Designing Data-Intensive Applications" by Martin Kleppmann. It talks about a lot of the same things in the context of distributed systems and databases. Specifically in the context of eventual consistency vs. other read/write ordering guarantees, a lot of the same thinking applies. It's a great book, especially if you're working in with those types of systems.
@jankarsten4593 2 роки тому
Honestly one of the best books I've read
@Lexikon00 3 роки тому ⁺²
So, if I increment a counter in different threads and I just care to fetch the result after all threads have resolved, I can just use relaxed because there is no read just writes and therefore the ordering is not relevant? In regards that we take care that stuff happens in one execution.
@OMGclueless 3 роки тому
Yes, this is mostly correct. Your statement that "there is no read just writes" isn't correct, updating a counter is both a read *and* a write. But when updating a counter you probably don't care what value your thread reads, so it's OK to use relaxed.
More precisely, if you use `fetch_add` to guarantee that the updates to the counter are atomic, and you call `.join()` on all the threads to provide the Happens-Before relationship that guarantees that the threads' updates to the counter are visible to you before you read the final value, then you can do all the updates with `Ordering::Relaxed` and things will work.
@jeffg4686 3 роки тому
Hi Jon, I have a small question regarding std::cell::Cell and std::rc::RefCell if you happen to have a minute. So, I get the concept of being able to change an inner value on an immutable reference, and have seen examples where a clone() implementation that takes a shared reference to self, but also needs to update some other member of self, and hence the issue since it's only shared reference, and so the work around is to use these cells. I was wondering do you tend to see this usage often outside of things like clone() implementations? I guess it's good to know these cells are there and what they do, but when thinking about when I'd ever use it, I kinda see it as quite rare, but good to know exists -- is that your take, or are these actually used all over the place in typical Rust code. thanks for any thoughts
@jonhoo 3 роки тому ⁺¹
I most commonly see in use-cases where you need mutable access to multiple subsets of a collection that are determined at runtime. This comes up frequently in graphs, but that's not the only place. In those cases, you have multiple & to the collection, and then take out &mut through something like RefCell as you, say, traverse nodes in topological order.
@error200http 3 роки тому
1:14:47 C++ docs are enlightening
@pictureus Рік тому
When you say in the beginning that one thread may never get the update.
If you have memory X = 1
Thread A that writes 2 to X in a loop.
Thread B that reads X in a loop.
Wouldn't Thread B eventually get the value 2 but not in the same step as you later mention?
@johndowson1852 3 роки тому ⁺¹⁸
You should give Dark Reader extension a try. It gives a decent enough dark mode experience on pretty much any website.
@driedurchin 3 роки тому
Absolutely! Love that extension.
@mwcz5190 3 роки тому
Great extension.
@jonhoo 3 роки тому ⁺²²
Ah, but you see, I don't actually _like_ dark mode for anything that has a lot of text on it like documentation. I *only* use dark mode for streaming because I was told by a bunch of viewers early on that their eyes hurt since they're watching the streams in a dark room ;)
@driedurchin 3 роки тому ⁺⁸
@@jonhoo Blasphemer! Dark mode for life 🕶️
@AbhinavKulshreshtha 3 роки тому ⁺³
@@jonhoo
Well I should thank you for being so considerate for us. I do get flash blinded when everyone in youtube video open a very bright website. Maybe its because I have become too depend on dark mode.
@arpandhatt6011 2 роки тому
If we were to acquire and release atomic variables inside of a loop, can memory ordering allow the CPU to wrap operations around the loop? For example, if we did a Release operation at the end of the loop, the contents of the loop will now be visible to other threads. If the CPU correctly predicts that the loop will run again, it will have already begun processing instructions for the next iteration of the loop. Is it guaranteed that it won’t move these operations ahead of the release synchronization from the previous iteration of the loop?
@jonhoo 2 роки тому ⁺¹
Ah, no, the memory model generally guarantees in-order semantics within a given thread. Reordering is only with respect to the actions of other threads.
@pm71241 3 роки тому
Hmm... this loom thing reminds me of Kyle Kingsbury's "jepsen" and "Knossos" software to check distributed databases for consistency issues.
@hrmny_ 3 роки тому ⁺¹
I'm not sure whether I understand the example with `SeqCst`, do atomics when they change value not still have the same memory address?
Can the CPU just change that around?
And wouldn't it be horrible if it does, but the load can still refer to the old location?
How can it at that point even guarantee that it ever gets the new value and your references to the atomic don't become stale immediately
How can `load` get an old value after it has been changed in memory 🤔
@aGuyFromSomewhere 3 роки тому ⁺⁴
I think this may, for example, come down to caching. When Threads run on different cores, both cores may have a given value copied in their caches. Both cores may modify the value in their cache, without the other core seeing the modification. If the ordering of the modifications does not specify a clear “happened before” relation between them, then the cores may chose not to invalidate their caches before the modifications, which would be costly. This is probably also why stricter orderings (than relaxed) are considered to incur a performance cost. The cores cannot optimize as freely (with caches and out-of-order execution and the likes) when an ordering is specified. So if performance is a concern, you always want to find the “most relaxed” ordering that makes the program still work correctly.
Anyway, caching is just my guess, there might also be other reasons for Threads to see different values of the same memory location.
@mohsanabbas6835 2 роки тому
Yum num yum
@correabuscar Місяць тому
1:21:40, that Relaxed (on fail), doesn't it mean it uses that memory ordering to load the value? even before the compare happens? or is it internally split then into two operations: a load for the compare&exchange and if the compare failed, then a load(again!!) for the actual value; and these two are grouped into an atomic operation; but doesn't seem to make sense, so doesn't it load it with the failure ordering (aka Relaxed in this case) then compare that value and if it's same as expected then it does the store with the success ordering(here Acquire), and these two load&(compare&)store are themselves internally atomic ?! otherwise it seems kinda weird that it would load the same value twice, once for success and once for fail; so then if what i expect is correct then the ordering args would be Release(for success) and Acquire(for failure), because it would load it with Acquire and store it with Release, or well, maybe storing it with Release isn't needed? unclear on this part. But i'm only talking about the compare_exchange_weak here, if it wasn't clear, not the last store of unlocked.
@correabuscar Місяць тому
the doc says this:
"compare_exchange_weak takes two Ordering arguments to describe the memory
ordering of this operation. success describes the required ordering for the
read-modify-write operation that takes place if the comparison with current succeeds.
failure describes the required ordering for the load operation that takes place when
the comparison fails. Using Acquire as success ordering makes the store part
of this operation Relaxed, and using Release makes the successful load
Relaxed. The failure ordering can only be SeqCst, Acquire or Relaxed."
@jeffg4686 3 роки тому
A little confused on Box::leak. So leak gives you back a reference to the actual value and consumes owned binding, but confused as to why one would want to over just allocating Box::new alone. What benefit does that yield? thanks
@jonhoo 3 роки тому
Box::leak gives you a &'static, which you can then give out many copies of, including to other threads. If you have a Box, any reference to the Box is tied to the lifetime of the Box (i.e., it isn't 'static), so you can't send it to other threads and such!
@jeffg4686 3 роки тому
@@jonhoo - gotcha thanks much. Actually, kinda wondering what makes this thread safe? 'static lifetime doesn't itself imply thread safe does it?
@SimonBuchanNz 3 роки тому ⁺¹
@@jeffg4686 the short version is that it's a non-mut reference, eg a "shared" reference, and the type is an atomic which implements Sync and thus is declaring itself safe to use from multiple threads at the same time.
@Blure 3 роки тому ⁺¹⁴
NOM NOM NOM NOM
@deadskrillaskrit2078 3 роки тому
Monja monja
@jacklong2182 3 роки тому
Do you use vim nerdtree plugin? how do you navigate between different files
@jonhoo 3 роки тому ⁺¹
Nope. I just use fzf.vim :) github.com/jonhoo/configs/blob/26409d160dff9a667d3fcb854bf83b37b544b154/editor/.config/nvim/init.vim#L28
@austecon6818 2 роки тому ⁺¹
I think my take home message from this has been: If I do multi-threading in Rust. Just use a damned Mutex.
@theroboman727 Рік тому
At least learn channels as well. They fit some problems much better. The book ha sa subchapter on them.
@keithmiller8510 11 місяців тому
I knew the difference but 16:00 really gave me an ah-ha moment on usage.
@xwtek3505 21 день тому
2:36:45 uh, Rust has .as_ptr(), which is surprisingly safe, as dereferencing it is unsafe anyway.
@yuan.pingchen3056 8 місяців тому
is the rust language capable to write single process single thread non-atomic data-type console program?
@jonhoo 8 місяців тому ⁺¹
yes
@yuan.pingchen3056 8 місяців тому
@@jonhoo this is a good news, but, it seems like it doesn't have the capability to off-loading workloads to GPU? isn't it ? (Heterogeneous computing)
@Dygear 2 роки тому
I wonder if the data race for the locking of the mutex is more likely on LITTLE.big architectures because of the different speed of the P and E cores.
@Dygear 2 роки тому
In ordering::relaxed section, y can store into x's 42 value because they happen at the same time? The let r1 is step a1, the let r2 is step b1 and happen at the same time as they are in different threads. the x.store is step a2, and the y.store is step b2. Because it's relaxed the compiler is free and the the CPU is free to execute these functions in any order so a1 can happen at the same time b2 is happening meaning we see y's stored value cross over into x.
@Dygear 2 роки тому
Getting a Raspberry Pi 4 to show the Relaxed Ordering failing could be interesting.
With how powerful Ordering is over how the compiler will execute the statements it sees, it would almost be helpful to have a Syntax lowlighting in the background of a different color to show were the relationships start and end.
@Dygear 2 роки тому
So atomics are single assembly instructions, an instruction is only consider atomic when no other application, thread, process, core will have a chance to interject itself into your program flow because it's a single instruction. Electrons are in flight changing the value, nothing else can happen until that's done.
@Dygear 2 роки тому
1:30:55 - Line 864: What on earth is `x @ Ok(_)`?! I've never seen that before.
@manuelaraujo2571 Рік тому
I got confused by the argument for why z=0 is impossible in the SeqCst example. I think the argument should be that if t1 does not add, it must be that it sees y = false after it sees x = true, which establishes that the x load happens before the y load, so t2 must see x = true after it sees y = true and it adds. Similarly, if t2 does not add then this means the y load happens first and so t1 adds.
@xderfszvfthgycu978 3 роки тому
underrated
@pkunk 3 роки тому ⁺²
Explaining atomics without Indiana Jones reference.
@garlicxu4768 3 роки тому
Nice job, btw why not just get a dark mode browser plug-in 😂
@Ergzay 3 роки тому ⁺⁴
I have a problem with the part at 1:53:05, you say "there's no requirement that x is true even though tx has run". That's simply false. An Ordering::Acquire will see all Ordering::Release of the same variable, if it's run. Did you mean to say "even though ty has run"?
@jonhoo 3 роки тому ⁺⁵
The C specification is actually extremely vague on this front. From the C specification section 5.1.2.4: "The value observed by a load of an atomic object depends on the “happens before” relation, which in turn depends on the values observed by loads of atomic objects. The intended reading is that there must exist an association of atomic loads with modifications they observe that, together with suitably chosen modification orders and the “happens before” relation derived as described above, satisfy the resulting constraints as imposed here." Section 7.17.3.1 does say that "[i]mplementations should make atomic stores visible to atomic loads within a reasonable amount of time", but that's really all. Ultimately, by my reading at least, if two threads A and B have no relationship except that A writes x (with Release) and B reads x (with Acquire), it doesn't _seem_ like there is a requirement that B _will_ see A's write to x. There _is_ a requirement that _if_ B sees A's write to x, then B read of x "happens after" A's write to x. Now, _in practice_ you won't generally observe this oddity since chances are that there are also other "happens before" relationships that
The reason I conclude that this is the right reading of the standard, as opposed to "any acquire load of x that occurs after (by wall-clock time) a release store of x _will_ see the x", is that otherwise the example given in the reference (en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering) would not require SeqCst - we'd be back to "either tx or ty happened first" even for Acquire/Release ordering, in which case an outcome of 0 is impossible. But the reference says that 0 _is_ possible under Acquire/Release, which must mean that it's possible for tx to run, but its store only being visible to one thread but not another.
@Ergzay 3 роки тому ⁺¹
@@jonhoo Thanks for the response. I'm not quite convinced, but I'll think on this some more. I'm wondering if this is maybe a difference between C11 and C++11 given that you're mixing the two languages here.
@jonhoo 3 роки тому ⁺¹
@@Ergzay They both have the same memory ordering model as far as I am aware, I just had the C11 spec more readily available :)
@Ergzay 3 роки тому
@@jonhoo As a side note, if you have sway in the loom project, if you could talk them into increasing MAX_THREADS or making it configurable, it's currently a hard coded const value to the number "4" which makes it impossible to use loom with this example as there's 5 threads including the main thread. Using loom with this example and it just instantly panics. It makes it hard for me to understand how it can be used for anything other than trivial examples given it's so limited.
@jonhoo 3 роки тому ⁺¹
@@Ergzay Yeah, it's a known challenge. Part of the problem is that loom needs to track a lot of per-thread state, and it does so with fixed size arrays to avoid allocating (which would kill performance). Bumping the number of threads would thus also bump the size of the entire working state fairly drastically. Though I wonder if maybe it'd be possible to use the new const generics to make it configurable!
You can actually model this case in loom just by doing the store to x in the main thread. In general I've found it to be rare that you need more than four threads to replicate any given race condition at the level of atomics.
@uvarovrs40 3 роки тому
Can machine instructions reordering make function below to panic by making the line B to be executed before the line A?
fn foo() -> Result {
...
if my_vec.len() < 2 { return Err( MyErr::TooShortErrMsg ); } // line A
let last_value = my_vec.last().unwrap(); // line B, last() function returns None if my_vec is empty
...
Ok(())
}
@jonhoo 3 роки тому ⁺¹
No - the compiler must respect "program order" as far as any visible side-effects. The reason it can seemingly reorder operations when multiple threads are involved is that it's allowed to choose semi-arbitrarily which other threads' writes are visible at what time.
@deadskrillaskrit2078 3 роки тому
Bruh ya selling me on coding.
@user-ov5nd1fb7s Рік тому
ua-cam.com/video/rMGWeSjctlY/v-deo.html
Instruction level parallelism is what you mean, i think.
@user-ov5nd1fb7s Рік тому
ua-cam.com/video/rMGWeSjctlY/v-deo.html
I think you wanna say data dependencies.
@miketurner3461 Рік тому
Stop trying to make fetch happen
@mmmvvkk 2 роки тому
the low freq. feedback of your keyboard resonating through your microphone when you're typing, thats really annoying :>