I looked at the title and thought "Oh! Everyone knows what a mutex is!" But I'm glad I watched because I'd never thought that acquiring a mutex either needs a mutex or needs to be atomic. ;) Great stuff as ever!
If your processor has a Read-Modify-Write instruction, you can implement it in one instruction, simple. If your processor lacks it, the simplest way is to disable interrupts for the read, modify and write instructions and then reenable interrupts. Though you have to be careful to _not_ reenable interrupts if they were initially disabled.
From my understanding both are synchronization primitives and both are used to solve critical section problems. Having said that, a mutex is a locking mechanism, whereas a semaphore is a signaling mechanism. The former goes to say that a single task can acquire the mutex, effectively owning it. This means that only the thread / process associated with that task can enter the critical section. The mutex needs to be released by the owner thread, and it can only do so upon exiting the critical section. At a high level, it can be described as a mutual exclusion object. A mutex object is either locked or unlocked. The latter pretty much says that if a thread is waiting on a semaphore it can still be signaled by another thread. In short, a semaphore can be used by several threads (based on priority). That is not the case with a mutex. There are two types of semaphores: binary and counting. At a high level, it can be described by means of an integer variable. Note that a binary semaphore is not equivalent to a mutex. As for pros / cons, I recommended looking those up online (there are some lists on blogs), or trying both out for yourself on a microcontroller.
Mutex is a Boolean lock/unlock state. Semaphore is a finite resource counter (E.g. only three processes can access some data at once). Note that a mutex is a semaphore - it’s a semaphore where the counter is set to one so that only one thing can access the resource at at time.
@@gregoryfenn1462 yes, so i ask pros and cons because if you can do the same and more with a semaphore like with mutexes, what is the reason for the existence of the mutexes?
@@gaborm4767 randombit answered too but the short answer in general is that since semaphores can do more than muticies, they won’t be as efficient or simple to use as a mutex. A tool that is can do lots of things will be less good at particular task compared to tools that are more specialised. So if you just need to lock or unlock a resource, a mutex is better than a semaphore.
This is a good video, though I would have used different instructions for the MuTex. Most ISAs have instructions that are intended for implementation of synchronization primitives like MutEx and Semaphores. The ARM Cortex-M provides LDREX and STREX. The first sets a reservation for the source memory address. If any other entity writes the location, the reservation is lost. STREX conditionally writes to the location, and will fail if the reservation has been lost. The entity that writes the memory and breaks the lock could be another thread on the same or other processor cores or DMA from an external master. Usually, a MutEx will not be set up in memory that is the target for DMA, but having multiple cores, even on microcontrollers, is becoming more common.
what about in distributied systems? are there other techniques that work with highly asymmetric systems? possibly other complete semantics and methodologies that don't require coordination in distributed multi-processor systems. please what can you tell me?
Mad respect as an argentinian for showing footage of the Super TC2000 motor racing series, one of (if not) the most technologically advanced series in Latin America. Thank you ✨
"Mutex in C" should be "Mutex with POSIX threads". Plenty of systems use mutex without pthreads. ESP32 Base code and FreeRTOS for example. Not sure why anyone would make yet another RTOS. FreeRTOS is good for small stuff, NuttX for larger stuff, Linux for big stuff, Any of the three +Gate array for obscure/awkward stuff !
02:29 lock doesn't make instruction atomic; the instructions the prefix has effect are already atomic read-modify-write operations. SO: "The LOCK prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible." The effect of lock can be quite dramatic performance drop as "ownership of the appropriate cache line" means that on multi-core architecture the most recent writes into the line have to be propagated to every other core accessing the line (when the cores don't share the caches, the level of sharing varies between architectures and cache hierarchy levels but L1 is typically exclusive).
Nah. Cache coherency is a small part of memory consistency. What the lock guarantees is that load/store queues are flushed and executed the right way and that the observes in the specified domain see the same thing. But the first part of your message is correct, as in a single core system hardware locks are not necessary.
@@AlessioSangalli TL;DR - the instructions are already atomic and lock does nothing along those lines. Short version is that it gives exclusive access to the memory and what you describe is effect of that fact. It just happens to be a detail that it is more efficient to have the exclusivity per cache line rather than all of the available memory so that more shit can happen at the same time. Thanks for clarifications btw.
@@AlessioSangalli Indeed, I recognized that and thank you! I guess when you write too much past the comment in mind it gets tangled up ... :P anyways, cool stuff
OK im a little confused. The STM32 blue pill should have one CortexM3 core. As it is not a multiprocessor, are hardware atomic operations (as in with the correct memory barriers) even useful?
Atomic lock sounds like Atomic Clock. Imagine a processor running on an Atomic Clock using Atomic Locks for Mutexes, while the Program running is running Atomic Simulations that were written in the Atom editor.
Is probably quite simple, let's say two cores, you check the instruction on the other core if is not a mutex then execute normally, lock the memory block and continue, if by chance both cores are receiving the mutex in their pipelines at the same, then just choose one and lock that one, the other will wait in the loop until the lock is released... Pretty much is communication within the cores as you would in human time scales, like a core would say, I am locking mutex at address in the next cycle, is any of you going to lock at that address next cycle, if no then no collision, if yes just "hard-print" which core will lock and which one will wait... Is pretty much what is expected but at a cycle/instruction level... "Hard print" because processors at printed not coded lol
I was recently reading about quiescent consistency vs liniarliability. The document I was reading was: Quiescent Consistency: Defining and Verifying Relaxed Linearizability tl;dr you wait for all code to end execution (a point of quiescence) so there are no thread collisions. Linearizability is what you explain with the lock whereas quiescence would be programming the start and end of threads such that they wouldn't collide. The documentation calls quiescence soft and linearized hard. Cause calls separated by points of quiescence can not collide yet using a lock is used essentially to fake quiescence. Since you are writing an OS what is it that contributed to using the Linearized paradigm vs using times of quiescence to separator thread timing?
@250CC do NOT attempt to use volatile to implement atomic access. Your code is wrong and will fail as soon as you stop using x86 processors (free acqrel on every basic memory operation) Atomics also inhibit optimization, but you are told EXACTLY how optimization is inhibited instead of "I dunno it does something I guess" Relaxed atomics are free on ARM, just use those
I have seen exactly two valid use cases for C volatile: (1) POSIX signal handlers (2) Platform-specific mmio code. The C standard alone gives no useful behavior for volatile and you must combine it with additional guarantees for it to do anything at all
It should be a yield step not a sleep step, a sleep step would still waste the CPU, a yield step would allow the RTOS to execute other tasks whilst it is waiting. It is just an example though so what he showed here probably wont be what ends up in the final RTOS.
The context switching may be expensive and the other threads may not be able to do anything meaningful if you are locking for short time periods. What LLL described is a spinlock and it is actually the way to go when you are only locking for short amounts of time and do not want to pay the context switching cost.
This just really makes me hate all the assholes on programming forums I had to deal with all the more, back when I first started learning how to program in 2006. Just asking how anything in the standard library works would immediately get you attacked: "Why are you even asking that; just use the ones in the library" "What, you think you can do it better" Even if you managed to get thru to them that you just wanted to learn how they work and weren't trying to implement it yourself, you would suddenly be met with this wall of silence. Always dozens of people willing to criticize, but never anyone willing to actually answer the damn question. Have learned a lot since then, both about how the underlying mechanics of many programming languages and hardware, as well as that the most vocal people in a group often have little else to contribute that simply being the loudest, but it really made me reluctant to want to deal with other programmers unless it was face to face
For my part, I never understood why cpu maker create a “CAS” opcode (compare and set) for mutex. Compare operation require the ALU so are slow as opposed to an “XCHG” opcode (for i86 or TAS for 68k) which is much faster because it is only 1 read and 1 write to memory. From there, you build a spinlock mutex to make an OS mutex which can be used to make a full semaphore.
cas is for other things, not spinlocks. You are indeed right, atomic exchange one time and then atomic load in a loop is better for spinlock implementation.
Hum... I am working on the same thing, though I am targeting a few different chips... Sorta built-in portability. So... Lemme get your opinion. The OS needs to set up each process stack (and initial context frame), each process heap, schedule each process, and provide mechanism for IPC?
been using lock flags in gui apps even before having an idea what a mutex is. i think just calling it "mutex" without understanding how it works behind the scenes makes it unnecessarily mysterious.
Does this mean that there are times when there are non-blocking mutex lock functions, like where it'll just fail (silently or not) if it's already in use?
It's certainly possible. For example in Windows API there is a function, WaitForSingleObject(handle, timeout), which can return with an error code after a specified delay (which could be 0). Or if you're rolling your own assembly, you can just set a flag indicating whether the lock succeeded and not loop. I'm not sure if the POSIX standard says anything about that. If you're stuck with a blocking API, you could also use a native mutex to protect your own custom mutex variable, to get around the problem in the "bad mutex" example. That's not perfect, as the process scheduler could interrupt you at any time with the lock active, but it at least minimizes the time in the critical section, assuming that the real data you're trying to protect takes significantly longer to process.
The atomic hardware operations will take care to trigger the appropriate memory barriers, otherwise they would.be useless. In a non-multiproceasor environment atomics are not even necessary for mutexes
Atomic Operations are operations on the CPU that cannot be interrupted and are therefore immune to a race condition. What if there are 2 threads running on 2 separate cores of the cpu. Both threads can compare and swap at the exact same time. What happens then?
I'm not too familiar with x86 but I believe what happens is that if another processor attempts to access it, it will halt until the lock is released and then continue. So if I run the instruction lock inc byte ptr [1234h] then any other processor that tries to do the same will idle until the first processor finishes. Remember this is only really a thing for instructions that need to read, edit, and writeback. You actually can't LOCK any arbitrary instruction from what I remember. Only the ones where it matters
I didn't learn the word "mutex" in my computer engineering classes (mid 2000's) either, just semaphores. A mutex is basically a semaphore initialized to 1. But being able to implement that in user space with a spin lock gives certain advantages, I guess, like avoiding a context switch / system call. Sometimes you need to get the scheduler involved so you don't have worker threads eating up cpu cycles when there's nothing to do. But just ensuring exclusive access tends to burn cycles when things are already spun up anyway.
Think of two people which needs to use a bathroom. The people are the threads and they need to shit. If they shit, they lock the bathroom and if they're done, they unlocked it.
I knew the basic idea of a mutex already, but it didn't occur to me that a mutex itself could run into a race condition as well, into partway into your explanation on what they are lol. Still not a hundred percent on how this works (dunno any assembly) but now I know it's not as straightforward as I naively assumed.
So, the software mutex is actually done at a processor level 😂😂, makes sense if you think about it, because is a recursive problem if you try to solve it at a software level, so do it at processor (physical) level, is not quite clear from the video tho...
The spinlock is used in conjunction with a mutex variable - it is one possibility of waiting until the mutex var is no longer in the locked state. You can think of the spinlock as a while loop that checks, if the mutex var is currently in the unlocked state. If not, it checks again and again and again until the mutex finally IS unlocked. Thats kinda wasteful because it uses CPU cycles for the constant checking but very easy to implement.
A mutex usually puts the waiting thread to sleep, so the only CPU used is when the mutex is released and the waiting thread is told to wake up. A spinlock will have the CPU in a loop constantly testing a variable.
A spinlock puts the thread to spin on a loop while a mutex puts the thread to sleep, usually using a special OS system call that tells the OS the thread is waiting on a mutex so that the OS can optimize for that in a way it wouldnt be able otherwise. Generally, spinlocks are faster than mutexes because mutexes pay a rather expensive syscall cost and spinlocks dont. However Spinlocks do not release the threads so if the number of threads is greater than the number of cores in the system the OS shcheduler will probably go nuts switching between the waiting threads and that is bad. Also spinlocks are generally a bad idea if you are waiting for relatively long periods of time. So generally, use a mutex if you are waiting for long periods of time or you have a lot of threads and use a spinlock if you are locking and unlocking really quickly.
But what if you want something that isn't a mutex but also isn't a regular data type? something with a "write protect lock" that can be enabled or disabled on-the-fly? (but still allows multiple threads to read from it regardless of it's state) What do you suggest then?
Great, now I'm going to have to find a video how uninterruptable instructions are possible in multi-core CPU architecture. And you didn't even show how mutexes are released. Or how re-entrancy is handled (each thread has an ID)
I'm sorry but I don't think there was enough explanation on how this "global variable" actually "locks" some other piece of memory for one task. Let's say I have two tasks, A and B, and they both at some point are attempting to write to a global variable myVar. If within the code of each task you literally have myVar = something, then what? Where does this "global variable mutex" come in to play?
I looked at the title and thought "Oh! Everyone knows what a mutex is!" But I'm glad I watched because I'd never thought that acquiring a mutex either needs a mutex or needs to be atomic. ;)
Great stuff as ever!
Reminds me of the question, "who delivers the mailman's mail?"
@@tf_d his wife
God I love your channel, your videos are the best I've seen on embedded/assembly stuff. Really fascinating topics presented beautifully!
Damn, I’d love a video on a DIY RTOS. Love your videos man!!!!
+1 because like wasn't enough
If your processor has a Read-Modify-Write instruction, you can implement it in one instruction, simple. If your processor lacks it, the simplest way is to disable interrupts for the read, modify and write instructions and then reenable interrupts. Though you have to be careful to _not_ reenable interrupts if they were initially disabled.
What are the differences between mutex and semaphore and what are the pros and cons?
From my understanding both are synchronization primitives and both are used to solve critical section problems. Having said that, a mutex is a locking mechanism, whereas a semaphore is a signaling mechanism.
The former goes to say that a single task can acquire the mutex, effectively owning it. This means that only the thread / process associated with that task can enter the critical section. The mutex needs to be released by the owner thread, and it can only do so upon exiting the critical section. At a high level, it can be described as a mutual exclusion object. A mutex object is either locked or unlocked.
The latter pretty much says that if a thread is waiting on a semaphore it can still be signaled by another thread. In short, a semaphore can be used by several threads (based on priority). That is not the case with a mutex. There are two types of semaphores: binary and counting. At a high level, it can be described by means of an integer variable. Note that a binary semaphore is not equivalent to a mutex.
As for pros / cons, I recommended looking those up online (there are some lists on blogs), or trying both out for yourself on a microcontroller.
Mutex is a Boolean lock/unlock state. Semaphore is a finite resource counter (E.g. only three processes can access some data at once). Note that a mutex is a semaphore - it’s a semaphore where the counter is set to one so that only one thing can access the resource at at time.
@@gregoryfenn1462 yes, so i ask pros and cons because if you can do the same and more with a semaphore like with mutexes, what is the reason for the existence of the mutexes?
@@gaborm4767 randombit answered too but the short answer in general is that since semaphores can do more than muticies, they won’t be as efficient or simple to use as a mutex. A tool that is can do lots of things will be less good at particular task compared to tools that are more specialised. So if you just need to lock or unlock a resource, a mutex is better than a semaphore.
This is a good video, though I would have used different instructions for the MuTex.
Most ISAs have instructions that are intended for implementation of synchronization primitives like MutEx and Semaphores. The ARM Cortex-M provides LDREX and STREX. The first sets a reservation for the source memory address. If any other entity writes the location, the reservation is lost. STREX conditionally writes to the location, and will fail if the reservation has been lost. The entity that writes the memory and breaks the lock could be another thread on the same or other processor cores or DMA from an external master. Usually, a MutEx will not be set up in memory that is the target for DMA, but having multiple cores, even on microcontrollers, is becoming more common.
what about in distributied systems? are there other techniques that work with highly asymmetric systems? possibly other complete semantics and methodologies that don't require coordination in distributed multi-processor systems. please what can you tell me?
Mad respect as an argentinian for showing footage of the Super TC2000 motor racing series, one of (if not) the most technologically advanced series in Latin America. Thank you ✨
"Mutex in C" should be "Mutex with POSIX threads". Plenty of systems use mutex without pthreads. ESP32 Base code and FreeRTOS for example. Not sure why anyone would make yet another RTOS. FreeRTOS is good for small stuff, NuttX for larger stuff, Linux for big stuff, Any of the three +Gate array for obscure/awkward stuff !
I'm a simple man. I see a video of this guy, i watch and leave a like
02:29 lock doesn't make instruction atomic; the instructions the prefix has effect are already atomic read-modify-write operations.
SO: "The LOCK prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible."
The effect of lock can be quite dramatic performance drop as "ownership of the appropriate cache line" means that on multi-core architecture the most recent writes into the line have to be propagated to every other core accessing the line (when the cores don't share the caches, the level of sharing varies between architectures and cache hierarchy levels but L1 is typically exclusive).
Nah. Cache coherency is a small part of memory consistency. What the lock guarantees is that load/store queues are flushed and executed the right way and that the observes in the specified domain see the same thing. But the first part of your message is correct, as in a single core system hardware locks are not necessary.
@@AlessioSangalli TL;DR - the instructions are already atomic and lock does nothing along those lines. Short version is that it gives exclusive access to the memory and what you describe is effect of that fact. It just happens to be a detail that it is more efficient to have the exclusivity per cache line rather than all of the available memory so that more shit can happen at the same time. Thanks for clarifications btw.
@@n00blamer yeah but what I wanted to clarify is that there are mechanisms that act even before the caches that play a role here. Thanks!
@@AlessioSangalli Indeed, I recognized that and thank you! I guess when you write too much past the comment in mind it gets tangled up ... :P anyways, cool stuff
absurdly amazing explanation!
great video bro. Very clear and to the point. TY!
Thank you :)
Nice, i always wonder how could a mutex gives the security of multithread, the Atomic operations Is the key, thanks Man, AND like
0:52 unexpected tc2000 appearance
OK im a little confused. The STM32 blue pill should have one CortexM3 core. As it is not a multiprocessor, are hardware atomic operations (as in with the correct memory barriers) even useful?
The example at 2:20 uses Intel assembly, that is more likely to have multiple cores
Atomic lock sounds like Atomic Clock.
Imagine a processor running on an Atomic Clock using Atomic Locks for Mutexes, while the Program running is running Atomic Simulations that were written in the Atom editor.
Everything was fast until the point u opened the atom editor
@@nimcompoo yess thanks to atom's electron js
I know that this is ment for humor, but timing between threads are critical and running on an atomic clock would be awesome
Great video. Now, how does lock work in silicon?
Is probably quite simple, let's say two cores, you check the instruction on the other core if is not a mutex then execute normally, lock the memory block and continue, if by chance both cores are receiving the mutex in their pipelines at the same, then just choose one and lock that one, the other will wait in the loop until the lock is released...
Pretty much is communication within the cores as you would in human time scales, like a core would say, I am locking mutex at address in the next cycle, is any of you going to lock at that address next cycle, if no then no collision, if yes just "hard-print" which core will lock and which one will wait...
Is pretty much what is expected but at a cycle/instruction level...
"Hard print" because processors at printed not coded lol
I was recently reading about quiescent consistency vs liniarliability.
The document I was reading was: Quiescent Consistency: Defining and Verifying Relaxed Linearizability
tl;dr you wait for all code to end execution (a point of quiescence) so there are no thread collisions.
Linearizability is what you explain with the lock whereas quiescence would be programming the start and end of threads such that they wouldn't collide.
The documentation calls quiescence soft and linearized hard. Cause calls separated by points of quiescence can not collide yet using a lock is used essentially to fake quiescence.
Since you are writing an OS what is it that contributed to using the Linearized paradigm vs using times of quiescence to separator thread timing?
Isn’t there a C directive to define a variable, in this case your mutex var, as atomic so you would not need to drop into assembly?
@250CC I never understood what volatile really means or does. I know it has something to do with interrupt routines. Thoughts?
@250CC do NOT attempt to use volatile to implement atomic access. Your code is wrong and will fail as soon as you stop using x86 processors (free acqrel on every basic memory operation)
Atomics also inhibit optimization, but you are told EXACTLY how optimization is inhibited instead of "I dunno it does something I guess"
Relaxed atomics are free on ARM, just use those
I have seen exactly two valid use cases for C volatile: (1) POSIX signal handlers (2) Platform-specific mmio code. The C standard alone gives no useful behavior for volatile and you must combine it with additional guarantees for it to do anything at all
@@kevy1yt If you do any development on game consoles you'll be using volatile to access I/O registers. It doesn't mean atomic though.
@@DFPercush what is its purpose with I/O in consoles?
this is why global state in frontend apps exists i get it now
May want to add a sleep step in there so your not wasting the cpu. Other threads could be wanting to run.
It should be a yield step not a sleep step, a sleep step would still waste the CPU, a yield step would allow the RTOS to execute other tasks whilst it is waiting. It is just an example though so what he showed here probably wont be what ends up in the final RTOS.
@@conorstewart2214 I do like RTOS. Need to be careful with thread priorities. Yield really only works on threads of same priority.
The context switching may be expensive and the other threads may not be able to do anything meaningful if you are locking for short time periods. What LLL described is a spinlock and it is actually the way to go when you are only locking for short amounts of time and do not want to pay the context switching cost.
Low Level Gang 😎
This just really makes me hate all the assholes on programming forums I had to deal with all the more, back when I first started learning how to program in 2006. Just asking how anything in the standard library works would immediately get you attacked:
"Why are you even asking that; just use the ones in the library"
"What, you think you can do it better"
Even if you managed to get thru to them that you just wanted to learn how they work and weren't trying to implement it yourself, you would suddenly be met with this wall of silence. Always dozens of people willing to criticize, but never anyone willing to actually answer the damn question.
Have learned a lot since then, both about how the underlying mechanics of many programming languages and hardware, as well as that the most vocal people in a group often have little else to contribute that simply being the loudest, but it really made me reluctant to want to deal with other programmers unless it was face to face
For my part, I never understood why cpu maker create a “CAS” opcode (compare and set) for mutex. Compare operation require the ALU so are slow as opposed to an “XCHG” opcode (for i86 or TAS for 68k) which is much faster because it is only 1 read and 1 write to memory. From there, you build a spinlock mutex to make an OS mutex which can be used to make a full semaphore.
cas is for other things, not spinlocks.
You are indeed right, atomic exchange one time and then atomic load in a loop is better for spinlock implementation.
LOW LEVEL GANGGGGG!
Hum... I am working on the same thing, though I am targeting a few different chips... Sorta built-in portability.
So... Lemme get your opinion. The OS needs to set up each process stack (and initial context frame), each process heap, schedule each process, and provide mechanism for IPC?
been using lock flags in gui apps even before having an idea what a mutex is. i think just calling it "mutex" without understanding how it works behind the scenes makes it unnecessarily mysterious.
This is not an mutex, this is a spinlock ;)
Does this mean that there are times when there are non-blocking mutex lock functions, like where it'll just fail (silently or not) if it's already in use?
It's certainly possible. For example in Windows API there is a function, WaitForSingleObject(handle, timeout), which can return with an error code after a specified delay (which could be 0). Or if you're rolling your own assembly, you can just set a flag indicating whether the lock succeeded and not loop. I'm not sure if the POSIX standard says anything about that. If you're stuck with a blocking API, you could also use a native mutex to protect your own custom mutex variable, to get around the problem in the "bad mutex" example. That's not perfect, as the process scheduler could interrupt you at any time with the lock active, but it at least minimizes the time in the critical section, assuming that the real data you're trying to protect takes significantly longer to process.
As another example, Go 1.18 added TryLock() to its various mutexes for exactly this case.
I heard that you also have to temporarily disable DMA before you perform the atomic operation; is that true on your chip?
Will this implementation work in a multi processor environment where cpu caches are used?
The atomic hardware operations will take care to trigger the appropriate memory barriers, otherwise they would.be useless. In a non-multiproceasor environment atomics are not even necessary for mutexes
Hey can you do a video on concurrent programming: threads, fibers, etc. maybe even job systems
Atomic Operations are operations on the CPU that cannot be interrupted and are therefore immune to a race condition.
What if there are 2 threads running on 2 separate cores of the cpu. Both threads can compare and swap at the exact same time. What happens then?
One thread will have to wait on the other for the memory access, thats the whole point of atomic operations and I think he mentioned it.
👏👏👏 great video..
so freakin helpful! 👏👏
Does it mean all the mutex implementations are actually powered by cpu support of lock commands?
got the source up anywhere?
But how does the assembly instruction work?
I'm not too familiar with x86 but I believe what happens is that if another processor attempts to access it, it will halt until the lock is released and then continue.
So if I run the instruction
lock inc byte ptr [1234h] then any other processor that tries to do the same will idle until the first processor finishes. Remember this is only really a thing for instructions that need to read, edit, and writeback. You actually can't LOCK any arbitrary instruction from what I remember. Only the ones where it matters
I may be wrong, but I believe we called this a Semaphore back in the early 1980s.
I didn't learn the word "mutex" in my computer engineering classes (mid 2000's) either, just semaphores. A mutex is basically a semaphore initialized to 1. But being able to implement that in user space with a spin lock gives certain advantages, I guess, like avoiding a context switch / system call. Sometimes you need to get the scheduler involved so you don't have worker threads eating up cpu cycles when there's nothing to do. But just ensuring exclusive access tends to burn cycles when things are already spun up anyway.
DIY RTOS video, please.
Think of two people which needs to use a bathroom. The people are the threads and they need to shit. If they shit, they lock the bathroom and if they're done, they unlocked it.
Wait, you wrote a RTOS, and didn’t make the main process the scheduler, which keeps track of these things?
Is your OS code on Github? It would be great to follow along for learning.
i learned something Thanks!
I knew the basic idea of a mutex already, but it didn't occur to me that a mutex itself could run into a race condition as well, into partway into your explanation on what they are lol. Still not a hundred percent on how this works (dunno any assembly) but now I know it's not as straightforward as I naively assumed.
Can you do a video on TSX (HLE/RTM)?
So, the software mutex is actually done at a processor level 😂😂, makes sense if you think about it, because is a recursive problem if you try to solve it at a software level, so do it at processor (physical) level, is not quite clear from the video tho...
Could you explain the difference between mutex and spinlock?
The spinlock is used in conjunction with a mutex variable - it is one possibility of waiting until the mutex var is no longer in the locked state. You can think of the spinlock as a while loop that checks, if the mutex var is currently in the unlocked state. If not, it checks again and again and again until the mutex finally IS unlocked. Thats kinda wasteful because it uses CPU cycles for the constant checking but very easy to implement.
A spin lock is one possible implementation of a mutex.
A mutex usually puts the waiting thread to sleep, so the only CPU used is when the mutex is released and the waiting thread is told to wake up. A spinlock will have the CPU in a loop constantly testing a variable.
A spinlock puts the thread to spin on a loop while a mutex puts the thread to sleep, usually using a special OS system call that tells the OS the thread is waiting on a mutex so that the OS can optimize for that in a way it wouldnt be able otherwise.
Generally, spinlocks are faster than mutexes because mutexes pay a rather expensive syscall cost and spinlocks dont. However Spinlocks do not release the threads so if the number of threads is greater than the number of cores in the system the OS shcheduler will probably go nuts switching between the waiting threads and that is bad. Also spinlocks are generally a bad idea if you are waiting for relatively long periods of time.
So generally, use a mutex if you are waiting for long periods of time or you have a lot of threads and use a spinlock if you are locking and unlocking really quickly.
You are a godsend
Are these only relevant to hardware level operations ?
So Linux' mutex can be improved with that way?
But what if you want something that isn't a mutex but also isn't a regular data type? something with a "write protect lock" that can be enabled or disabled on-the-fly? (but still allows multiple threads to read from it regardless of it's state)
What do you suggest then?
Can anyone give me resources for learning os? as I am a beginner.....
Yeah but how is it implemented in hardware>?
Great, now I'm going to have to find a video how uninterruptable instructions are possible in multi-core CPU architecture.
And you didn't even show how mutexes are released. Or how re-entrancy is handled (each thread has an ID)
Excellent.
Do semaphore next ;) then yielding (generators)
Ah I suspected locks need some amount of hardware support.
just a friendly reminder: templeos is still the greatest operating system of all time
Wait, wouldn't that be busy waiting? I'm taking a uni class on calculators, please educate me
It's not mutex , wrong tutorial , it's condvar
@@traveldiary1455 both work together
Back in the '80s we called it a semaphore. How is this "new" again?
Still confused but progress
Adelbert Square
// rawr
MuTexas?
Greatt
Advanced.
I'm sorry but I don't think there was enough explanation on how this "global variable" actually "locks" some other piece of memory for one task. Let's say I have two tasks, A and B, and they both at some point are attempting to write to a global variable myVar. If within the code of each task you literally have myVar = something, then what? Where does this "global variable mutex" come in to play?
🐧
Spinlock
To me the background music is very annoying. I had to watch on mute with CC.
Lmao
rawr owo