What an excellent talk... by now I've seen quite a few on the subject, and read many articles but this is crystal clear, from the ground up. Watch this before anything else.
Hi Mr. Sutter, On the last slide with fences. You mentioned g=1 and x=g , cannot be moved across fence. But the same is true for e.g if you use a mutex to lock global=temp. I don't see much difference there in terms of your argument about pessimization. Even though I admit in case of mutex the g=1 and x=g can be moved inside critical section and then x=1 can be done. Will this really be done by the compiler ? Regards
It is crazy as hell this only has 6151k views, Chernos videos get many many many times this and yet are not nearly as valuable. Not saying they are bad videos it is just that Thread programming is becoming a necessary Evil that not many are good at and if those people want to do graphics programming with good performance they should really learn what in this video. Thanks for the Upload I love this and have seen in many times.
@@seditt5146 I think you have no idea what you are talking about. I am writing high performance multithreaded codes (mostly servers and databases), for 18 years, and I never used an atomic in them. (I had some prototypes that did use atomics in some parts, but after testing, I always removed them, because they were not needed, and they were too easy to break and make incorrect). Never use atomics in your code manually, it is just a sign of poor design, and asking for trouble and subtle bugs. Use a good implementation of the mutex (like the one in abseil library), and that is all you need to write correct multithreaded code. The only use of atomics for a normal person is 1) low over head multi-thread counters, 2) library implementer that tries to implement mutexes, spinlocks or RCU, or some other very low level primitives (like lockless queue). Even then, it is extremely tricky to get right (and trust me, I implemented all of these too as a library).
@@movax20h For real. I'm no expert in concurrency or graphics programming, but I've been working on a fractal renderer recently and have *massively* increased performance by making it multithreaded - using literally nothing beyond std::future. There are doubtlessly ways to make it even faster using more finely tuned threading techniques (without even considering shifting the workload to the GPU), but the idea that atomics are *absolutely necessary* for this kind of work is clearly absurd. The rendering I'm doing only requires read-only access to shared data (a handful of render settings - and that data is so small I just give a copy to each thread for convenience anyway), and single-writer access to the output buffers (no two threads ever need to render the same pixel after all). I can't see any situation where atomics would help without introducing unnecessary complexity (and most likely potential bugs).
@@superscatboy Please read my messages again. Let me quote myself: " You don't need atomics all to write multi threading code. ", "Never use atomics in your code manually, it is just a sign of poor design". I don't know where did you get idea that I claimed that "that atomics are absolutely necessary".
41:44, "it has to assume that any opaque function call is a full barrier". I didn't exactly understand that. Does that mean, any opaque function call (even if it does not use lock anything internally) can be used as atomic with released/sc semantic?
He refers to “software fences”, i.e. situations that prevent compiler reordering around opaque functions, where the function might access the variable location in question (addressed-exposed variable). It is not related to hardware-related dynamic reorderings.
@@sirnawaz I think he was saying "you cannot use opaque function call be used as atomic", because "compiler (only) has to assume that the opaque function is a full barrier", compiler is only the software part, and the underlying hardware part (processor and cache) will not assume it is a full barrier.
@@jiaweihe1244 I tried using opaque function call and the generated code does not have any full-barrier instruction. So I'm unable to understand this part.
@@sirnawaz I think maybe modern compiler is able to prove whether a function call contains fence. Herb said in this talk that "compiler assume an opaque function call is a full barrier" if "it cannot prove/deduce that it uses a fence inside".
Herb Sutter mentioned at 1:19:50 that "i'm assuming that [...] int, in your particular hardware, platform and compiler is a properly aligned indivisibly readable and writteable variable", and then they mention x86 as if that assertion were true there. Is that actually true that some "types" are usually "pseudo-atomics" (let's call it that way)? Has this property or situation has any formal name so can google for it?
I think, for operations to be Atomic we need to use special instruction which operates on certain data type say 4 bytes long int or a byte long char now suppose you have a struct whose last member variable is of type int and it is misaligned or isn't properly padded even though we use the atomic instruction on that int it may operate in non-atomic manner due to alignment or padding I am not entirely sure about it and i have noticed you have asked it two years ago I hope you have found the answer, if you do please share it with us
@@rohitahuja2782 It turns out that in x86 and x86_64, if a read or a write of word size or less is properly aligned, then the read/write is guaranteed to be atomic (without using special instructions), in the sense that no thread will see half writes or reads (e.g., a thread reads something while it's being written by another thread), but there's no guarantees of "when" the write will happen. For example, a write can be made but the written value could live in cache without being commited to memory for a while; but in either case no half writes will ever be observed.
As far as I know (not a big expert), even if quantum computing one day becomes available for the general public, it would merely be a complement to our regular transistor-based classical computing; basically a classical CPU that also happens to have a quantum chip for some applications. Not all algorithms can be run faster on quantum chips; in fact, almost everything a consumer PC or phone does is _significantly_ easier with classical computing. In fact, I think quantum chips require a normal CPU to control them.
Should be a crime to not upload these talks in HD. A damn iPhone wouldve been better than the rotten potato used to film this. Hard to imagine this talk is for computer scientists and engineers.
Because of its age, I suspect this has been re-encoded N times. So while I agree it's not the nicest footage, it's probably much worse now than it was originally.
herb gives great talks except they don't lead to anything. it's the boy who cried "awesome" except no awesome ever came. now I'm sick of his talks because they just get me excited and then I'm disappointed.n. he should stop giving these talks until there's something concrete behind them.
What an excellent talk... by now I've seen quite a few on the subject, and read many articles but this is crystal clear, from the ground up. Watch this before anything else.
This talk cured my cancer
This explains it clearly . Glad you uploaded this video.
Phenomenal talk!
absolutely fantastic
Herb thank you so much for the slides
"okay, let's not get pathological, which is I know what we like to do" - LOL
Super helpful thanks for posting.
25:20 Yeah, this reminds me of the course of CMU 15-213. I really recommend it.
Hi Mr. Sutter,
On the last slide with fences.
You mentioned g=1 and x=g , cannot be moved across fence.
But the same is true for e.g if you use a mutex to lock global=temp.
I don't see much difference there in terms of your argument about pessimization.
Even though I admit in case of mutex the g=1 and x=g can be moved inside critical section and then x=1 can be done. Will this really be done by the compiler ?
Regards
at 1:18:10 if 'y' is a local variable it can be optimized for sure.
It is crazy as hell this only has 6151k views, Chernos videos get many many many times this and yet are not nearly as valuable. Not saying they are bad videos it is just that Thread programming is becoming a necessary Evil that not many are good at and if those people want to do graphics programming with good performance they should really learn what in this video. Thanks for the Upload I love this and have seen in many times.
You don't need atomics all to write multi threading code.
@@movax20h Who told you such nonsense? Sure you don't need them, you are perfectly allowed to make shitty buggy code.
@@seditt5146 I think you have no idea what you are talking about. I am writing high performance multithreaded codes (mostly servers and databases), for 18 years, and I never used an atomic in them. (I had some prototypes that did use atomics in some parts, but after testing, I always removed them, because they were not needed, and they were too easy to break and make incorrect). Never use atomics in your code manually, it is just a sign of poor design, and asking for trouble and subtle bugs. Use a good implementation of the mutex (like the one in abseil library), and that is all you need to write correct multithreaded code. The only use of atomics for a normal person is 1) low over head multi-thread counters, 2) library implementer that tries to implement mutexes, spinlocks or RCU, or some other very low level primitives (like lockless queue). Even then, it is extremely tricky to get right (and trust me, I implemented all of these too as a library).
@@movax20h For real. I'm no expert in concurrency or graphics programming, but I've been working on a fractal renderer recently and have *massively* increased performance by making it multithreaded - using literally nothing beyond std::future. There are doubtlessly ways to make it even faster using more finely tuned threading techniques (without even considering shifting the workload to the GPU), but the idea that atomics are *absolutely necessary* for this kind of work is clearly absurd.
The rendering I'm doing only requires read-only access to shared data (a handful of render settings - and that data is so small I just give a copy to each thread for convenience anyway), and single-writer access to the output buffers (no two threads ever need to render the same pixel after all). I can't see any situation where atomics would help without introducing unnecessary complexity (and most likely potential bugs).
@@superscatboy Please read my messages again. Let me quote myself: " You don't need atomics all to write multi threading code. ", "Never use atomics in your code manually, it is just a sign of poor design". I don't know where did you get idea that I claimed that "that atomics are absolutely necessary".
Brilliant.
Nice to have some ideas on the matter
41:44, "it has to assume that any opaque function call is a full barrier". I didn't exactly understand that. Does that mean, any opaque function call (even if it does not use lock anything internally) can be used as atomic with released/sc semantic?
He refers to “software fences”, i.e. situations that prevent compiler reordering around opaque functions, where the function might access the variable location in question (addressed-exposed variable). It is not related to hardware-related dynamic reorderings.
@@ArmLiberty Yes. Still not clear. You merely seemed to rephrase my doubt.
@@sirnawaz I think he was saying "you cannot use opaque function call be used as atomic", because "compiler (only) has to assume that the opaque function is a full barrier", compiler is only the software part, and the underlying hardware part (processor and cache) will not assume it is a full barrier.
@@jiaweihe1244 I tried using opaque function call and the generated code does not have any full-barrier instruction. So I'm unable to understand this part.
@@sirnawaz I think maybe modern compiler is able to prove whether a function call contains fence. Herb said in this talk that "compiler assume an opaque function call is a full barrier" if "it cannot prove/deduce that it uses a fence inside".
Herb Sutter mentioned at 1:19:50 that "i'm assuming that [...] int, in your particular hardware, platform and compiler is a properly aligned indivisibly readable and writteable variable", and then they mention x86 as if that assertion were true there. Is that actually true that some "types" are usually "pseudo-atomics" (let's call it that way)? Has this property or situation has any formal name so can google for it?
I think, for operations to be Atomic we need to use special instruction which operates on certain data type say 4 bytes long int or a byte long char
now suppose you have a struct whose last member variable is of type int and it is misaligned or isn't properly padded
even though we use the atomic instruction on that int it may operate in non-atomic manner due to alignment or padding
I am not entirely sure about it and i have noticed you have asked it two years ago I hope you have found the answer, if you do please share it with us
@@rohitahuja2782 It turns out that in x86 and x86_64, if a read or a write of word size or less is properly aligned, then the read/write is guaranteed to be atomic (without using special instructions), in the sense that no thread will see half writes or reads (e.g., a thread reads something while it's being written by another thread), but there's no guarantees of "when" the write will happen. For example, a write can be made but the written value could live in cache without being commited to memory for a while; but in either case no half writes will ever be observed.
Rip channel9
25:25 what did he say? slash O D? /Od ?
Yes.
/Od disable optimizations (default)
Never again? Quantum computing promises the return of the Neumann machine
As far as I know (not a big expert), even if quantum computing one day becomes available for the general public, it would merely be a complement to our regular transistor-based classical computing; basically a classical CPU that also happens to have a quantum chip for some applications. Not all algorithms can be run faster on quantum chips; in fact, almost everything a consumer PC or phone does is _significantly_ easier with classical computing.
In fact, I think quantum chips require a normal CPU to control them.
Should be a crime to not upload these talks in HD. A damn iPhone wouldve been better than the rotten potato used to film this. Hard to imagine this talk is for computer scientists and engineers.
Because of its age, I suspect this has been re-encoded N times. So while I agree it's not the nicest footage, it's probably much worse now than it was originally.
herb gives great talks except they don't lead to anything. it's the boy who cried "awesome" except no awesome ever came. now I'm sick of his talks because they just get me excited and then I'm disappointed.n. he should stop giving these talks until there's something concrete behind them.