Regarding the cryptic naming conventions of older software, it's worth noting that the original ANSI C standard specifies that for external identifiers only the first 6 characters are considered "significant." So if a program had two identifiers with the same 6-character prefix (but which were otherwise different) the compiler's behavior was undefined(!). In practice this means compilers were free to ignore longer identifiers, so programmers were pretty much forced to come up with really short names for functions/variables. I think it's unfortunate that the convention seems to have stuck with C developers, but it's understandable why programmers did that sort of thing back in the day.
Earlier in my career, I used to work on the Unix kernel. Worked on some of the lower level boot related stuff, a buffer cache reimplementation, scheduler changes and pseudo-device driver for control of multi-processor load balancing, among other things. I really appreciate your videos on these subjects. For me it brings back memories, refreshing things stored in my brain I haven’t accessed in years. Thank you for doing these videos. You do a great job of explaining things. For developers, even if just writing applications, having insight of what’s happening under the hood can be enlightening. Kernel developers get exposed to just about every programming trick in the book. I also think my kernel development experience made my debugging skills better and served me well long after my kernel days. I look forward to watching more of your videos.
If you’re particularly interested in this low level OS sorta stuff, you may find yourself more interested in Computer Engineering, you should look into it!
A full walkthrough like this is how I learn best. I can parse the video at my own speed and come back as needed. Following the code from start to finish, function by function, is so helpful in the learning / understanding process. This is certainly helping a plethora of developers get into kernel development, please, continue the awesome work!
Really good educational presentation - thanks for providing an overview to order of and why things are done. I did embedded RTOS system programming some years ago and I do have a good understanding on how things work - but nice to get a refresh on the general architecture and functionality of the various components - and some of it I never touched - so nice to get further info :)
Wow, even though I'm sort of familiar with those contents and concepts, and work as embedded SW engineer, this is very fun to watch, you explain in a very organized and clear way! Well done!
Great work. Please continue. I am reminded of my computer science lecture at the university. But here it is more condensed and directly explained on a live example. I like that. Not too theoretical.
Excellent explanation of the fence instruction, to keep the compiler *and* the CPU from reordering operations. As an EE and software hack, I've seen reference to the fence instruction. Your explanation makes it's usage (and why it's named 'fence') clear.
this is so cool! i'm currently taking an OS class where we work with xv6, and your videos have been really helpful :) right now we're implementing MLFQ scheduling, and watching this series is helping a ton in understanding how everything works. greetings from Argentina!
Thought I'd weigh in - the historical reason for C names being short isnt just lack of storage, it's because one of the popular early compilers (I forget which, maybe indeed K&R one) only compared the first 8 bytes of names to determine what's what, so your first 8 bytes had to count.
This is really cool! At work I deal with an RTOS and have access to the source and it's cool seeing the similarities in the steps many operating systems take to boot.
Just found this series and it's been fascinating to see how things work in-depth. Your explanations make it really easy to follow. Can't wait for the next video in the series!
Currently doing intro to computer architecture and next course is operating systems. I love this series. Allows me to come in with some sort of idea of what’s going on.
Really appreciate this source dive series! Exactly what I'm needing right now. I'm in an OS class and we're working with xv6 and this is seriously helpful! We're working with the x86 version of xv6, so it's interesting to see the differences.
Please continue the series. You are doing such a good job. I was looking for a similar introduction to an operating system kernel source code for some time. Your one is especially good because it is based on a simpler kernel then, let's say Linux. And, I like the quality of your explanations a lot.
😂 ~0:28:00 when I was wondering about that cryptic variable names and told myself: “obviously a very prehistoric code base” … just about the same was mentioned in the vid itself. PS: you missed the 1 hour mark this time 😮😉
thanks for sharing, I'm following the series :) I'm crazy to learn more about virtual memory, I believe I just know it in high level, tlb,mmu, pages,frames.... but I'm anxious to see it implemented!
28:38. i paused the video before this bit and said exactly what you said. why its so cryptic. why c source code in general, prefer these cryptic name rather than descriptive names. the principle i know is, code should be self documenting. thanks for that historic context...
The description for push_off isn't really correct, when you're talking about having "multiple locks" on the same CPU. We've already seen that locks aren't re-entrant, as calling acquire twice before release gets a panic. But the code IS allowing push_off calls to be re-entrant, meaning we can have a push_off followed by another push_off , then two pop_off calls. If we did this, only one set of push_off/pop_off calls could ever be from acquire/release (since acquire isn't re-entrant)... if acquire was the only caller, none of that logic would be necessary. In other words, push_off/pop_off has a bunch of logic and structures to make it re-entrant, but they're not related to locking. All mentions of "another lock" in the description of these functions are instead referring to "another (non-locking related) push_off caller".
I'm planning on following the code more or less linearly as the kernel is setup, and rounding up with how it looks mechanically within the kernel as processes are scheduled across the various cores. That may change, but it's the plan as it stands!
What happens with incoming interrupts while they are disabled? Are they lost forever, or queued somewhere to be processed after interrupts are enabled again?
There are a couple of levels to this (literally). On risc-v, you have global interrupt enable/disable control, but you can also have individual types of interrupts enabled or disabled. These are both controlled by bits in a CSR. There is another CSR called the interrupt pending register, where a bit will be set for a given interrupt type when it occurs, even if the interrupt is not handled due to it being disabled by either of the two mechanism. If a pending bit is set, AND the individual interrupt enable bit is set, AND the global interrupt enable bit is set, then the interrupt will happen. That means that if global interrupts were disabled for a time, but an interrupt did occur and set a pending bit, as soon as global interrupts were enabled again, the interrupt would be taken. There is another thing to note, which is that interrupts are delivered to all cores at the same time. Whichever core happens to be the fastest in that moment will be the one to handle it. So if you have 8 cores and 7 are busy, you're still going to get the interrupt. The implication is that once those other cores enable interrupts again, they are also going to take an interrupt that might have already been handled - but this is actually by design, because between the CPU and the interrupt source sits another piece of hardware called an interrupt controller. The interrupt controller takes interrupts from the outside world and delivers them to the CPU, but also has a memory mapped interface which allows the CPU to determine what the cause of the interrupt was, and whether or not it is still valid. A lot of details, but there is a lot hidden in your question! I hope some of that made sense
Took me a while to realise that this was the next video after "How does an OS boot? //Source Dive// 001", because I was looking for "How does an OS boot? //Source Dive// 002"
Excellent work, can you explain things like memory barriers instruction fences? It kinda seems like more efficient way of doing for things include parallelism, concurrency (in general we try to get compiler to out our way when talking about locks but this let's compiler to understand our intent more clearly so maybe more efficient?). There are many good explanation in web but your videos are more entertaining and more people might learn about. 👍
While I watched, I had some thoughts (I don't know a lot about that low level stuff), I saw before locking mutex it disables interrupts (I'm not sure if it disables it for that particular core that tries to acquire mutex or for whole CPU). But why we cannot process interrupt in another core that doesn't wait for mutex to be released, and if the interrupt belongs (data or code) to core that waits, "reschedule" it again after the mutex is released. It might be stupid, but Im curios if that would be possible.
I was writing cpu scheduler to one OS back in days. I gave up because I had that mental problem back then that if I couldn't come up with perfect solution, I refused to do it 😂 I was struggling how to write perfect priorized list and ended up with all kinds of schedulers where I tried to keep of statistics of triggered tasks to alter their priorities on fly and so on. It just went over my head.
31:54 "Critical sections" (not "critical paths") are called critical sections because they are the sections that must be protected by mutual exclusion (to avoid possible errors/race conditions), not because they must finish quickly.
You're right, I mistake on the terminology there. I do think you missed my point in that particular comment however. I covered that the reason for locking was resource protection. However spinlocks in particular are a concurrency primitive where you need to aim to keep the critical section as short as possible because your exclusive access blocks all other CPUs in the system.
are spin locks the only thing that can turn off interrupts? if not, then I see some issues with the implementation of push_off and pop_off (in regards to intena)
24:55 Question: initlock() is passed a pointer to a string (“cons” for example) with type (char *) to retain that information for debugging. But that points to a string in the caller’s data. Can you be sure that the string persists? If it is ephemeral, the pointer would no longer be valid. Do you have to do anything in the code calling initlock() to ensure that the string can always be accessed with the original pointer value? (And yes, this series of videos is amazing. Instant subscription. I’m retired so I no longer write code for embedded CPUs, but your explanations are a delight to follow.)
Probably I understood 10 percent of material cause I have low skill but it was really interesting. Main insight is that see into source of OS is not so terrifying)
Seems like push_off() and pop_off() aren't the only call in kernel that can toggle interrupt? Otherwise why intena is necessary? But this seems contradict with pop_off() panic-ing when interrupt is enabled if something else can also toggle the interrupt?
There's just something sexy (and a bit terrifying) about those short and confusing C function names for low-level operations. I can imagine if it were Java, we'd have to deal with AsynchronousSupervisorInterruptHandlerDelegateFactory.
Would you consider using some drawing program to illustrate what you're talking about? there are some points where the CS background needed goes way up and you go into a 5 minute tangent. Visuals could help alleviate that
I would guess it is referring to the same thing indeed. Just be aware that sleeping usually refers to a different kind of lock waiting mechanism, where the kernel can schedule other useful work
First of all, great Video, I’ve been searching for a good explanation of the ins and outs of operating systems for ages! But finally I have one more question regarding the process of acquiring the lock 43:40. I understand how this can work on systems with multiple threads and one core. What I don’t understand is how an operation can be “atomic across cores”. In my mind it would be possible for multiple cores to run the operation to acquire the lock at the same time with no way of preventing this. I have done some googling and from my understanding the solution to this problem is that the memory controller makes sure that no two cores can access the same memory at the same time and thus the swap operation will always be “delayed” on one core. Is my understanding correct or am I missing something? Thanks in advance!
You are right. What makes it easier to grok imo is if you keep in mind that all the cores are part of a single cpu. That cpu as a whole can actually orchestrate things like explicit atomic memory operations (which occur with specific atomic instructions like amoswap). Since the cpu itself is guarding the gate way to memory so to speak, it can force a single memory access at a time.
I believe it's about cpu cache's behavior in implementation. Before anything actually read from memory to be put into the sort of destination register it has to be passed onto cache levels like a sort of in between step for ram and cpu core's register. Atomic instructions let only single core to pull that address, if anyone at that moment requires the same value in that address, cache gives a mismatch so other core must be required to wait for address to be re-read again. CPU cache works like a sort of ladder there's L1, L2 maybe L3 levels before the given cores ALU, register sets get their hands on data. CPU reads memory in chunks executes in batches so it works faster while slower memory and io devices try to do their job in peace. Fences, barriers these sort of instructions govern the rules for CPU caches and core synchronization. Fascinating stuff, btw I'm talking on top of my head; check and verify the info, it may very well be implementation dependent on CPU family, manufacturer. Look at this image, it will make more sense: jenkov.com/images/java-concurrency/cpu-cache-coherence-and-java-concurrency-1.png
Are there an alternative to "spinning" during these locks? Like stop feeding the CPUs instructions and wake them up by injecting the atomic read of the lock toits instruction queue whenever some other cpu lets go of the lock?
RISC-V architecture has an instruction that could help you eliminate busy waiting on locks, it's the WFI (Wait For Interrupt) instruction. The CPU goes into low power mode and awakes when it gets a hardware interrupt. You could implement it so that when one hart unlocks on the variable it interrupts one of the Harts that is waiting on the lock (By writing to the MSIP register that belongs to that hart), making it continue execution. This would have the same effect as spinning on the lock, but would draw less power. However spinlocks are the prefered method since they are way easier to implement, and because the spec allows the WFI instruction to bi implemented as no operation, which would reduce the locking mechanism to a spinlock.
To add to what's already been said about WFI, usually spinlocks are used only in very critical, short-lived circumstances in the kernel. On the user process level, a different type of lock is used, which basically puts that process to sleep until its ready to be woken up. During sleep, other processes can run on the CPU.
Hi, thx for this interesting content! A few questions about the locks and the way they are used. 1) I am not sure if I understand the purpose of the noff pointer. It basically counts the number of processes that are interested in a lock? This is done to make sure that all waiting processes will get some time on the/any CPU before interrupts are enabled again? Preventing the situation that interrupts are processed while a process is spinning for the lock? 2) What interrupts are disabled during the lock acquire phase? I understand why no process switch should happen, but why would one prevent other interrupts that dont affect the process? 3) The lock is per cpu. Does this mean a process starts and stays at a single CPU? in most operating systems that i know, cpu's can jump between cpus. Is for locks only the "current" CPU for a process of importance? thank you! Keep up the great work!
1. It's keeping track of the number of critical code paths that require uninterrupted execution. This can be spinlocks that are protecting some values in memory, but there are also some other areas of the code that tap into this mechanism. 2. All interrupts for the active core are disabled. This includes timers, external interrupts, and software interrupts. The reason is that an interrupt causes the flow of execution to move to some other area of the code. Getting an interrupt would break the model of a spinlock, which needs to guarantee that while holding the lock, it has exclusive access. Think about this: You acquire a lock for some resource, and some kind of interrupt diverts execution to some new pathway, that immediately tries to acquire the same spinlock. It can't, so it spins. The original code now doesn't have a chance to finish what it's doing and release the lock, so the CPU gets into a deadlock situation. 3. The lock itself is global, since it is actually something that lives in memory. It is true that any CPU attempting to acquire a lock has interrupts disabled until it is done with the resource - even when it needs to spin. That said, processes are not bound to a core. We'll look at the scheduler in later videos, but basically the thing that drives process switching is timer interrupts. So if a timer interrupt comes in, and a core is not busy with a lock, that process is marked as sleeping, and another process is picked up. At some point, when a timer interrupt arrives on some other core, it can pick up that same sleeping process and run it. I hope that answers at least some of your questions!
Sorry if this should be obvious but I don't really understand how `started` is shared between cores here with them each having their own stack at this point? I probably don't understand all the implications of `volatile static` but I thought the value would live on the stack.
You mentioned that the names are shortened to reduce memory usage of code. however, if we add comments, wouldn't that be a net increase in storage required? Are the comments added recently to help people and weren't part of the original codebase? Or, ig the comment is a one time trade off cost for a variable used like 100 times.
I think it was a trade off they had to think about constantly. Certainly folks were expected to study the code hard to understand how it worked, but also knowing the context of how those people would be reading it is also important. Now we're all able to read behind our screens in the comfort of our own homes, but back then computers were only available at large institutions like universities and bigger companies. If you were in the position to be reading the code, chances are you were in a team with some smart folks that already understood it pretty well, so knowledge permeate that way too.
Amazing video! I would really appreciate if you could talk about differences between qemu and real HW in future episodes. Would the code do the same on real HW or would some of the things be done by bootloaders? Would the OS even run in the highest privilege machine mode? So in general more information on different runlevels and what they are made for would be interesting. Also interesting for me would be how hypervisors and VMs work and how the native kernel interacts with the hypervisor compared to a VMs kernel? I don't know if such stuff is available any OS for risc v. I mean stuff like WSL2, virtual box, vine or docker containers. I guess they just run on Operating Systems running on x86 and ARM.
Why do we run a function, then acquire a lock and wait, blocking that CPU instead of saying "hey, I'd like this resource, and when it's available I want to run this code" essentially making locks non-blocking
Great question! That does actually happen at the process level - the kernel can kind of take over and scheduler another process to run instead. In the kernel code is trickier though - what can you switch to? There is an instruction called wfi (wait for interrupt), which in some cases will block until an interrupt is delivered to the cpu - but there are no interrupts on lock releases. I could definitely imagine an architecture where such a thing could exist though. In future we'll take a look at sleeplocks, which let a process go to sleep until some event has completed.
What's kind of confusing to me is knowing what data is shared among the cores, and what's not. For example, struct { .. } cons appears to be shared among all the cores.
All data is shared among all cores. It kind of has to be that way because at the time the code is compiled, it can't know how many cores the cpu it is running on will have. This is the main reason that the stack0 array from the first video is setup to have enough memory for all cores. This is why the spinlock is so important as well - it is shared between cores, and through atomic operations is able to sequence operations on the fields it protects in the structure.
Request: "how does riscv handle the situation where a hardware interrupt comes in while some atomic operation is executing, and interrupts are temporarily turned off?" ### assistant: When a hardware interrupt occurs while an atomic operation is executing and interrupts are temporarily turned off, RISC-V handles the situation by ensuring that the atomic operation completes before servicing the interrupt. RISC-V processors have a mechanism called "interrupt disable" that allows the software to temporarily disable interrupts. When interrupts are disabled, the processor will not respond to any interrupt requests until they are re-enabled. During an atomic operation, interrupts are typically disabled to ensure that the operation is executed atomically, without any interruption. If a hardware interrupt occurs while interrupts are disabled, the interrupt request is still registered by the processor, but it is not serviced immediately. Once the atomic operation completes and interrupts are re-enabled, the processor will check if any interrupt requests were registered during the disabled interrupt state. If there are pending interrupts, the processor will handle them in the order they were received, following the interrupt priority scheme defined by the system. In summary, RISC-V ensures that interrupts are not serviced immediately when they occur during an atomic operation with interrupts disabled. Instead, the interrupts are queued and serviced once the atomic operation completes and interrupts are re-enabled. >>> so there is a kind of interrupt queue for this? Can it overflow? what would happen in that case if it did? Yes, RISC-V processors typically have an interrupt queue to handle pending interrupts. This queue can store multiple interrupt requests that occurred while interrupts were disabled. The size of the interrupt queue is implementation-dependent and can vary between different RISC-V processors. In some cases, the interrupt queue may have a fixed size, while in others it may be dynamically allocated. If the interrupt queue becomes full and a new interrupt request arrives, it can lead to an overflow condition. The behavior in such a case depends on the specific implementation of the RISC-V processor. In some implementations, when the interrupt queue overflows, the oldest pending interrupt may be dropped or discarded to make space for the new interrupt. This means that the dropped interrupt will not be serviced until the next occurrence of that interrupt. Other implementations may have mechanisms to handle overflow conditions more gracefully. For example, they may provide additional storage or buffers to accommodate a larger number of pending interrupts, or they may prioritize certain types of interrupts over others. It's important to note that the behavior in case of an interrupt queue overflow is implementation-specific and may vary between different RISC-V processors. System designers and software developers need to consider the interrupt handling capabilities and limitations of the specific RISC-V processor they are working with to ensure proper interrupt management and avoid potential issues caused by overflow conditions.
I recommend to read Oleg Kutkov blog "C++ in Linux kernel" first. There is some issues which prevents using pure C++ approach. Another thing is, C++ does not provide anything useful to kernel code. You can't actually use many of its features in kernel and on the other hand, kernel is not really suitable for object oriented approach. I've written many network drivers for Windows in C and there hasn't been a single case where C++ would have simplified things. And compiling C++ does not bring any magical performance boost to the machine code but instead I would say the opposite. I think old C64 basic is more suitable for kernel than C++ :) Actually I think you'd stumble in multidute of problems when you need to call C++ from the mandatory C code and that is really annoying (at least to me) because of how C++ compiler mangles the name of the symbols. Your C++ code ends up with lots of "extern" directives. But as a disclaimer, it's been decades since I've been working with kernel code so I might be totally wrong.
Imagine you make a call to kernel printf. You acquire the printf lock. Then printf calls into a device specific console write function, which acquires its own lock (by this point, you already have two levels). Finally, the console device calls into the uart driver code, which of course, has its own lock. As soon as you release the uart lock, you don't want interrupts enabled because you're still holding the other two locks.
Francis I don't think the unix style of naming has anything to do with space restrictions, even though that might be a "pro" for its time, if read programs or even languages from the time names aren't consistently short, take algol64 for example, which uses a keywords rather than symbols, or a bit later literal programming style, those takes more space. In the modern age, where space is not a problem at all for source code, if you read Ken's (or others from ex-unix labs researchers, like Russ Cox one of the authors of xv6) public code like the Plan 9 operating system or more recently Go programming language, he consistently still uses this style. IMO naming is just one more unix simplifications that are lost. In my own project I try to use this style, when you start to name things rather then describe them I feel more productive, but I have no measure for this, maybe it's my conscience confirming my bias. Best regards.
The unix style of naming did indeed have to do with very early system limitations of some of the systems Thompson and Ritchie worked with. Some of the linkers on those systems had a limit of 6 characters, so external names in early Unix code were limited to 6 characters. They didn't _have to_ limit the names for Unix system calls to 6 letters, but they wanted to remain fully interoperable with fortran, or at least use the same restrictions.
This push_off function doesn't make sense for me. Now interrupts are disabled every single time when this function is executed, even when are already disabled. In my opinion intr_off() should be placed inside the if(mycpu()->noff ==0).
26:55 why names are so bad, the early compilers like C89 which this is using only guarantee unique names at something like 6 characters for functions, so that's why we have strlen instead of string_length.
Great video, however would it be possible to break the youtube video down somewhat into chapters, with longer videos like this it can be harder to switch in and out of watching it
One question about atomic operations: is there a way that the system prevents two CPU cores performing the swap instruction at the exact same moment? If two cores happened to run the swap at the same moment could they not both believe that they acquired the lock? Or is it built into the hardware that only one atomic operation can be possibly perfromed on any one cpu at any one time? Really enjoying the series, by the way!
Yes, exactly what you're getting at: the CPU forces synchronous access for atomic operations, so barring hardware level bugs, the value seen by an atomic swap will always be consistent across cores.
Regarding the cryptic naming conventions of older software, it's worth noting that the original ANSI C standard specifies that for external identifiers only the first 6 characters are considered "significant." So if a program had two identifiers with the same 6-character prefix (but which were otherwise different) the compiler's behavior was undefined(!). In practice this means compilers were free to ignore longer identifiers, so programmers were pretty much forced to come up with really short names for functions/variables. I think it's unfortunate that the convention seems to have stuck with C developers, but it's understandable why programmers did that sort of thing back in the day.
5:14 "What's the CLINT? That's a good question!"
A very good question many men ask
This has to be the best systems programming channel on UA-cam right now. Extraordinary work.
Earlier in my career, I used to work on the Unix kernel. Worked on some of the lower level boot related stuff, a buffer cache reimplementation, scheduler changes and pseudo-device driver for control of multi-processor load balancing, among other things. I really appreciate your videos on these subjects. For me it brings back memories, refreshing things stored in my brain I haven’t accessed in years.
Thank you for doing these videos. You do a great job of explaining things.
For developers, even if just writing applications, having insight of what’s happening under the hood can be enlightening. Kernel developers get exposed to just about every programming trick in the book. I also think my kernel development experience made my debugging skills better and served me well long after my kernel days.
I look forward to watching more of your videos.
Thank you for this comment Rick 🙏
what's your github man?
Very excited to see how printf works! I've always wondered but never looked too deep.
It's going to be fun - it's one things where the sheer simplicity of it is totally obvious (but only after the fact 😁)
I'm a CS student learning C, and this is a great channel to get into the deeper stuff
If you’re particularly interested in this low level OS sorta stuff, you may find yourself more interested in Computer Engineering, you should look into it!
Keep going! this series is amazing
These series are making the kernels very approachable, thank you!
Can't thank you enough for this series. This is incredibly useful.
A full walkthrough like this is how I learn best.
I can parse the video at my own speed and come back as needed.
Following the code from start to finish, function by function, is so helpful in the learning / understanding process.
This is certainly helping a plethora of developers get into kernel development, please, continue the awesome work!
Really good educational presentation - thanks for providing an overview to order of and why things are done. I did embedded RTOS system programming some years ago and I do have a good understanding on how things work - but nice to get a refresh on the general architecture and functionality of the various components - and some of it I never touched - so nice to get further info :)
Amazing work! It's a pleasure to watch your deep dives. Can't even tell when the time flies by!
What a great work, thanks for such dense but yet conceivable and understandable video!
I am really enjoying this deep dive into how a kernel works. Please keep this series going!
Wow, even though I'm sort of familiar with those contents and concepts, and work as embedded SW engineer, this is very fun to watch, you explain in a very organized and clear way! Well done!
Great work. Please continue.
I am reminded of my computer science lecture at the university. But here it is more condensed and directly explained on a live example. I like that. Not too theoretical.
Excellent explanation of the fence instruction, to keep the compiler *and* the CPU from reordering operations. As an EE and software hack, I've seen reference to the fence instruction. Your explanation makes it's usage (and why it's named 'fence') clear.
really love this series, I can't wait for the next episodes :D
I've good some good stuff in store for you 😁
Brilliantly explained. Thanks a million for this.
this is so cool!
i'm currently taking an OS class where we work with xv6, and your videos have been really helpful :)
right now we're implementing MLFQ scheduling, and watching this series is helping a ton in understanding how everything works.
greetings from Argentina!
What a excellent way of teaching low level.
So excellent. I have been writing code professionally for almost 30 years, but this is so well structured and explained, I'm learning a lot.
I’m really enjoying this series, keep up the great work!
Thought I'd weigh in - the historical reason for C names being short isnt just lack of storage, it's because one of the popular early compilers (I forget which, maybe indeed K&R one) only compared the first 8 bytes of names to determine what's what, so your first 8 bytes had to count.
was very interesting the first and now second video. please keep it up
Great Video. I love the mix of System Software and emedded hardware videos.
This is really cool! At work I deal with an RTOS and have access to the source and it's cool seeing the similarities in the steps many operating systems take to boot.
Just found this series and it's been fascinating to see how things work in-depth. Your explanations make it really easy to follow. Can't wait for the next video in the series!
These videos are great, love the content, thank you very much! ✨
Currently doing intro to computer architecture and next course is operating systems. I love this series. Allows me to come in with some sort of idea of what’s going on.
Really appreciate this source dive series! Exactly what I'm needing right now. I'm in an OS class and we're working with xv6 and this is seriously helpful! We're working with the x86 version of xv6, so it's interesting to see the differences.
Great Video. I enjoy watching it. Keep going, please.
Great video! Really easy to follow
Love your content. This is great work!
Please continue the series. You are doing such a good job. I was looking for a similar introduction to an operating system kernel source code for some time. Your one is especially good because it is based on a simpler kernel then, let's say Linux. And, I like the quality of your explanations a lot.
I adore source dives!
Excellent Video!
Such a cool series!
Looking forward for next episode.
I just recorded it today - it's going to be a fun one!
This series looks damn amazing, subscribed
very nice series
This is a great resource, thanks for doing this.
Amazing work well done really rare 👏👏👏
This is way better than my OS class I took back when I was in school
😂 ~0:28:00 when I was wondering about that cryptic variable names and told myself: “obviously a very prehistoric code base” … just about the same was mentioned in the vid itself.
PS: you missed the 1 hour mark this time 😮😉
Good stuff mate!
Awsome work, i hope u do more videos that i can learn fron :)
Thanks, great video! You earned yourself a new subscriber
Looking forward for the 003. Thanks!
Great video! 🎉
amazing video, as always
explaining is very clear
thanks for sharing, I'm following the series :) I'm crazy to learn more about virtual memory, I believe I just know it in high level, tlb,mmu, pages,frames.... but I'm anxious to see it implemented!
We'll get there! Really glad you're enjoying it.
28:38. i paused the video before this bit and said exactly what you said. why its so cryptic. why c source code in general, prefer these cryptic name rather than descriptive names.
the principle i know is, code should be self documenting.
thanks for that historic context...
Thank you.
Great Work....thanks
Thank you so much! ❤
Great Content! One suggestion: run a de-esser over the audio track to reduce the spikes from "z","s",.. consonants
Fair point, thanks for the feedback!
The description for push_off isn't really correct, when you're talking about having "multiple locks" on the same CPU.
We've already seen that locks aren't re-entrant, as calling acquire twice before release gets a panic.
But the code IS allowing push_off calls to be re-entrant, meaning we can have a push_off followed by another push_off , then two pop_off calls.
If we did this, only one set of push_off/pop_off calls could ever be from acquire/release (since acquire isn't re-entrant)... if acquire was the only caller, none of that logic would be necessary.
In other words, push_off/pop_off has a bunch of logic and structures to make it re-entrant, but they're not related to locking.
All mentions of "another lock" in the description of these functions are instead referring to "another (non-locking related) push_off caller".
awesome video
Love the content, Looking forward to the next episode. Do you have a roadmap of where you want the series to go?
I'm planning on following the code more or less linearly as the kernel is setup, and rounding up with how it looks mechanically within the kernel as processes are scheduled across the various cores. That may change, but it's the plan as it stands!
What happens with incoming interrupts while they are disabled? Are they lost forever, or queued somewhere to be processed after interrupts are enabled again?
There are a couple of levels to this (literally). On risc-v, you have global interrupt enable/disable control, but you can also have individual types of interrupts enabled or disabled. These are both controlled by bits in a CSR. There is another CSR called the interrupt pending register, where a bit will be set for a given interrupt type when it occurs, even if the interrupt is not handled due to it being disabled by either of the two mechanism. If a pending bit is set, AND the individual interrupt enable bit is set, AND the global interrupt enable bit is set, then the interrupt will happen. That means that if global interrupts were disabled for a time, but an interrupt did occur and set a pending bit, as soon as global interrupts were enabled again, the interrupt would be taken.
There is another thing to note, which is that interrupts are delivered to all cores at the same time. Whichever core happens to be the fastest in that moment will be the one to handle it. So if you have 8 cores and 7 are busy, you're still going to get the interrupt. The implication is that once those other cores enable interrupts again, they are also going to take an interrupt that might have already been handled - but this is actually by design, because between the CPU and the interrupt source sits another piece of hardware called an interrupt controller. The interrupt controller takes interrupts from the outside world and delivers them to the CPU, but also has a memory mapped interface which allows the CPU to determine what the cause of the interrupt was, and whether or not it is still valid.
A lot of details, but there is a lot hidden in your question! I hope some of that made sense
I am trying to understand this, but I can't find the clint.
Took me a while to realise that this was the next video after "How does an OS boot? //Source Dive// 001", because I was looking for "How does an OS boot? //Source Dive// 002"
Glad you found them anyway!
Excellent work, can you explain things like memory barriers instruction fences? It kinda seems like more efficient way of doing for things include parallelism, concurrency (in general we try to get compiler to out our way when talking about locks but this let's compiler to understand our intent more clearly so maybe more efficient?). There are many good explanation in web but your videos are more entertaining and more people might learn about. 👍
I talk about memory barriers at around 46:30
@@LowByteProductions I know, I finished the video. I was looking for more in depth explanation, maybe in the future perhaps?
While I watched, I had some thoughts (I don't know a lot about that low level stuff), I saw before locking mutex it disables interrupts (I'm not sure if it disables it for that particular core that tries to acquire mutex or for whole CPU). But why we cannot process interrupt in another core that doesn't wait for mutex to be released, and if the interrupt belongs (data or code) to core that waits, "reschedule" it again after the mutex is released. It might be stupid, but Im curios if that would be possible.
I was writing cpu scheduler to one OS back in days. I gave up because I had that mental problem back then that if I couldn't come up with perfect solution, I refused to do it 😂 I was struggling how to write perfect priorized list and ended up with all kinds of schedulers where I tried to keep of statistics of triggered tasks to alter their priorities on fly and so on. It just went over my head.
31:54 "Critical sections" (not "critical paths") are called critical sections because they are the sections that must be protected by mutual exclusion (to avoid possible errors/race conditions), not because they must finish quickly.
You're right, I mistake on the terminology there. I do think you missed my point in that particular comment however. I covered that the reason for locking was resource protection. However spinlocks in particular are a concurrency primitive where you need to aim to keep the critical section as short as possible because your exclusive access blocks all other CPUs in the system.
are spin locks the only thing that can turn off interrupts? if not, then I see some issues with the implementation of push_off and pop_off (in regards to intena)
Thanks so much!
24:55 Question: initlock() is passed a pointer to a string (“cons” for example) with type (char *) to retain that information for debugging. But that points to a string in the caller’s data. Can you be sure that the string persists? If it is ephemeral, the pointer would no longer be valid.
Do you have to do anything in the code calling initlock() to ensure that the string can always be accessed with the original pointer value?
(And yes, this series of videos is amazing. Instant subscription. I’m retired so I no longer write code for embedded CPUs, but your explanations are a delight to follow.)
couldnt a context switch happen between intr_get() and intr_off()? or would that be okay anyways?
Probably I understood 10 percent of material cause I have low skill but it was really interesting. Main insight is that see into source of OS is not so terrifying)
Seems like push_off() and pop_off() aren't the only call in kernel that can toggle interrupt? Otherwise why intena is necessary?
But this seems contradict with pop_off() panic-ing when interrupt is enabled if something else can also toggle the interrupt?
There's just something sexy (and a bit terrifying) about those short and confusing C function names for low-level operations. I can imagine if it were Java, we'd have to deal with AsynchronousSupervisorInterruptHandlerDelegateFactory.
😂
Would you consider using some drawing program to illustrate what you're talking about? there are some points where the CS background needed goes way up and you go into a 5 minute tangent. Visuals could help alleviate that
There will definitely be some diagrams in the upcoming videos!
I'm used to when a program burns cycles just to wait for something for it to be called "busy sleep" Is it any different from "spinning"?
I would guess it is referring to the same thing indeed. Just be aware that sleeping usually refers to a different kind of lock waiting mechanism, where the kernel can schedule other useful work
First of all, great Video, I’ve been searching for a good explanation of the ins and outs of operating systems for ages! But finally I have one more question regarding the process of acquiring the lock 43:40. I understand how this can work on systems with multiple threads and one core. What I don’t understand is how an operation can be “atomic across cores”. In my mind it would be possible for multiple cores to run the operation to acquire the lock at the same time with no way of preventing this. I have done some googling and from my understanding the solution to this problem is that the memory controller makes sure that no two cores can access the same memory at the same time and thus the swap operation will always be “delayed” on one core. Is my understanding correct or am I missing something? Thanks in advance!
You are right. What makes it easier to grok imo is if you keep in mind that all the cores are part of a single cpu. That cpu as a whole can actually orchestrate things like explicit atomic memory operations (which occur with specific atomic instructions like amoswap). Since the cpu itself is guarding the gate way to memory so to speak, it can force a single memory access at a time.
I believe it's about cpu cache's behavior in implementation. Before anything actually read from memory to be put into the sort of destination register it has to be passed onto cache levels like a sort of in between step for ram and cpu core's register. Atomic instructions let only single core to pull that address, if anyone at that moment requires the same value in that address, cache gives a mismatch so other core must be required to wait for address to be re-read again. CPU cache works like a sort of ladder there's L1, L2 maybe L3 levels before the given cores ALU, register sets get their hands on data. CPU reads memory in chunks executes in batches so it works faster while slower memory and io devices try to do their job in peace. Fences, barriers these sort of instructions govern the rules for CPU caches and core synchronization. Fascinating stuff, btw I'm talking on top of my head; check and verify the info, it may very well be implementation dependent on CPU family, manufacturer. Look at this image, it will make more sense: jenkov.com/images/java-concurrency/cpu-cache-coherence-and-java-concurrency-1.png
I don't understand the condition on cpuid() ==0. Is main() invoked once for each core??
Yes, every core runs the same program, but will start behaving differently depending on the core ID
@@LowByteProductions That sounds really weird to me. Hope you cover that in a bit more detail later! Thanks for the videos.
So advanced topics. Wow. But what for beginners? Tannenbaun book about OS?
You are PRO BRO
Great video. It's an amezing source of low level info. Thank you for it.
Is it possible to lock your camera focus?
Yes, it is and I have for the next videos 😁 it also drove me crazy when I watched the video back
Are there an alternative to "spinning" during these locks? Like stop feeding the CPUs instructions and wake them up by injecting the atomic read of the lock toits instruction queue whenever some other cpu lets go of the lock?
RISC-V architecture has an instruction that could help you eliminate busy waiting on locks, it's the WFI (Wait For Interrupt) instruction. The CPU goes into low power mode and awakes when it gets a hardware interrupt. You could implement it so that when one hart unlocks on the variable it interrupts one of the Harts that is waiting on the lock (By writing to the MSIP register that belongs to that hart), making it continue execution. This would have the same effect as spinning on the lock, but would draw less power. However spinlocks are the prefered method since they are way easier to implement, and because the spec allows the WFI instruction to bi implemented as no operation, which would reduce the locking mechanism to a spinlock.
To add to what's already been said about WFI, usually spinlocks are used only in very critical, short-lived circumstances in the kernel. On the user process level, a different type of lock is used, which basically puts that process to sleep until its ready to be woken up. During sleep, other processes can run on the CPU.
Hi, thx for this interesting content!
A few questions about the locks and the way they are used.
1) I am not sure if I understand the purpose of the noff pointer. It basically counts the number of processes that are interested in a lock? This is done to make sure that all waiting processes will get some time on the/any CPU before interrupts are enabled again? Preventing the situation that interrupts are processed while a process is spinning for the lock?
2) What interrupts are disabled during the lock acquire phase? I understand why no process switch should happen, but why would one prevent other interrupts that dont affect the process?
3) The lock is per cpu. Does this mean a process starts and stays at a single CPU? in most operating systems that i know, cpu's can jump between cpus. Is for locks only the "current" CPU for a process of importance?
thank you!
Keep up the great work!
1. It's keeping track of the number of critical code paths that require uninterrupted execution. This can be spinlocks that are protecting some values in memory, but there are also some other areas of the code that tap into this mechanism.
2. All interrupts for the active core are disabled. This includes timers, external interrupts, and software interrupts. The reason is that an interrupt causes the flow of execution to move to some other area of the code. Getting an interrupt would break the model of a spinlock, which needs to guarantee that while holding the lock, it has exclusive access. Think about this: You acquire a lock for some resource, and some kind of interrupt diverts execution to some new pathway, that immediately tries to acquire the same spinlock. It can't, so it spins. The original code now doesn't have a chance to finish what it's doing and release the lock, so the CPU gets into a deadlock situation.
3. The lock itself is global, since it is actually something that lives in memory. It is true that any CPU attempting to acquire a lock has interrupts disabled until it is done with the resource - even when it needs to spin. That said, processes are not bound to a core. We'll look at the scheduler in later videos, but basically the thing that drives process switching is timer interrupts. So if a timer interrupt comes in, and a core is not busy with a lock, that process is marked as sleeping, and another process is picked up. At some point, when a timer interrupt arrives on some other core, it can pick up that same sleeping process and run it.
I hope that answers at least some of your questions!
Wonderful video ! I love your vscode theme, may I ask you which one is it ?
Thank you :) The theme is "dracula"
Thanks 🙏
Sorry if this should be obvious but I don't really understand how `started` is shared between cores here with them each having their own stack at this point? I probably don't understand all the implications of `volatile static` but I thought the value would live on the stack.
No worries! started is a global variable, so it has a fixed spot in ram
@@LowByteProductions ahh, duh, it should have a spot in bss. These are great!
You mentioned that the names are shortened to reduce memory usage of code. however, if we add comments, wouldn't that be a net increase in storage required? Are the comments added recently to help people and weren't part of the original codebase? Or, ig the comment is a one time trade off cost for a variable used like 100 times.
I think it was a trade off they had to think about constantly. Certainly folks were expected to study the code hard to understand how it worked, but also knowing the context of how those people would be reading it is also important. Now we're all able to read behind our screens in the comfort of our own homes, but back then computers were only available at large institutions like universities and bigger companies. If you were in the position to be reading the code, chances are you were in a team with some smart folks that already understood it pretty well, so knowledge permeate that way too.
Amazing video! I would really appreciate if you could talk about differences between qemu and real HW in future episodes. Would the code do the same on real HW or would some of the things be done by bootloaders? Would the OS even run in the highest privilege machine mode? So in general more information on different runlevels and what they are made for would be interesting. Also interesting for me would be how hypervisors and VMs work and how the native kernel interacts with the hypervisor compared to a VMs kernel? I don't know if such stuff is available any OS for risc v. I mean stuff like WSL2, virtual box, vine or docker containers. I guess they just run on Operating Systems running on x86 and ARM.
Difference between emulated environment and hardware is definitely something I'll be covering at some point!
Why do we run a function, then acquire a lock and wait, blocking that CPU instead of saying "hey, I'd like this resource, and when it's available I want to run this code" essentially making locks non-blocking
Great question! That does actually happen at the process level - the kernel can kind of take over and scheduler another process to run instead. In the kernel code is trickier though - what can you switch to? There is an instruction called wfi (wait for interrupt), which in some cases will block until an interrupt is delivered to the cpu - but there are no interrupts on lock releases. I could definitely imagine an architecture where such a thing could exist though.
In future we'll take a look at sleeplocks, which let a process go to sleep until some event has completed.
What's kind of confusing to me is knowing what data is shared among the cores, and what's not.
For example, struct { .. } cons appears to be shared among all the cores.
All data is shared among all cores. It kind of has to be that way because at the time the code is compiled, it can't know how many cores the cpu it is running on will have. This is the main reason that the stack0 array from the first video is setup to have enough memory for all cores.
This is why the spinlock is so important as well - it is shared between cores, and through atomic operations is able to sequence operations on the fields it protects in the structure.
Request: "how does riscv handle the situation where a hardware interrupt comes in while some atomic operation is executing, and interrupts are temporarily turned off?"
###
assistant: When a hardware interrupt occurs while an atomic operation is executing and interrupts are temporarily turned off, RISC-V handles the situation by ensuring that the atomic operation completes before servicing the interrupt.
RISC-V processors have a mechanism called "interrupt disable" that allows the software to temporarily disable interrupts. When interrupts are disabled, the processor will not respond to any interrupt requests until they are re-enabled.
During an atomic operation, interrupts are typically disabled to ensure that the operation is executed atomically, without any interruption. If a hardware interrupt occurs while interrupts are disabled, the interrupt request is still registered by the processor, but it is not serviced immediately.
Once the atomic operation completes and interrupts are re-enabled, the processor will check if any interrupt requests were registered during the disabled interrupt state. If there are pending interrupts, the processor will handle them in the order they were received, following the interrupt priority scheme defined by the system.
In summary, RISC-V ensures that interrupts are not serviced immediately when they occur during an atomic operation with interrupts disabled. Instead, the interrupts are queued and serviced once the atomic operation completes and interrupts are re-enabled.
>>> so there is a kind of interrupt queue for this? Can it overflow? what would happen in that case if it did?
Yes, RISC-V processors typically have an interrupt queue to handle pending interrupts. This queue can store multiple interrupt requests that occurred while interrupts were disabled.
The size of the interrupt queue is implementation-dependent and can vary between different RISC-V processors. In some cases, the interrupt queue may have a fixed size, while in others it may be dynamically allocated.
If the interrupt queue becomes full and a new interrupt request arrives, it can lead to an overflow condition. The behavior in such a case depends on the specific implementation of the RISC-V processor.
In some implementations, when the interrupt queue overflows, the oldest pending interrupt may be dropped or discarded to make space for the new interrupt. This means that the dropped interrupt will not be serviced until the next occurrence of that interrupt.
Other implementations may have mechanisms to handle overflow conditions more gracefully. For example, they may provide additional storage or buffers to accommodate a larger number of pending interrupts, or they may prioritize certain types of interrupts over others.
It's important to note that the behavior in case of an interrupt queue overflow is implementation-specific and may vary between different RISC-V processors. System designers and software developers need to consider the interrupt handling capabilities and limitations of the specific RISC-V processor they are working with to ensure proper interrupt management and avoid potential issues caused by overflow conditions.
Now I just want to fork the OS and rewrite it in C++ just for fun
I would love to see it if you do.
Do it!
I recommend to read Oleg Kutkov blog "C++ in Linux kernel" first. There is some issues which prevents using pure C++ approach. Another thing is, C++ does not provide anything useful to kernel code. You can't actually use many of its features in kernel and on the other hand, kernel is not really suitable for object oriented approach. I've written many network drivers for Windows in C and there hasn't been a single case where C++ would have simplified things. And compiling C++ does not bring any magical performance boost to the machine code but instead I would say the opposite. I think old C64 basic is more suitable for kernel than C++ :)
Actually I think you'd stumble in multidute of problems when you need to call C++ from the mandatory C code and that is really annoying (at least to me) because of how C++ compiler mangles the name of the symbols. Your C++ code ends up with lots of "extern" directives.
But as a disclaimer, it's been decades since I've been working with kernel code so I might be totally wrong.
Just rename files to .cpp
Done. Port made.
@@vikinggeorge7007 not what I meant... That's not even a port
Cool
Hello, maybe someone can answer my question: How can mycpu()->noff ever be more than 1? What am I missing here? Great series though :)
Imagine you make a call to kernel printf. You acquire the printf lock. Then printf calls into a device specific console write function, which acquires its own lock (by this point, you already have two levels). Finally, the console device calls into the uart driver code, which of course, has its own lock. As soon as you release the uart lock, you don't want interrupts enabled because you're still holding the other two locks.
i know kung fu. what do you think about getting into the serenity os code?
I'd definitely be open to it. Such a cool project
Francis I don't think the unix style of naming has anything to do with space restrictions, even though that might be a "pro" for its time, if read programs or even languages from the time names aren't consistently short, take algol64 for example, which uses a keywords rather than symbols, or a bit later literal programming style, those takes more space. In the modern age, where space is not a problem at all for source code, if you read Ken's (or others from ex-unix labs researchers, like Russ Cox one of the authors of xv6) public code like the Plan 9 operating system or more recently Go programming language, he consistently still uses this style. IMO naming is just one more unix simplifications that are lost.
In my own project I try to use this style, when you start to name things rather then describe them I feel more productive, but I have no measure for this, maybe it's my conscience confirming my bias. Best regards.
The unix style of naming did indeed have to do with very early system limitations of some of the systems Thompson and Ritchie worked with. Some of the linkers on those systems had a limit of 6 characters, so external names in early Unix code were limited to 6 characters. They didn't _have to_ limit the names for Unix system calls to 6 letters, but they wanted to remain fully interoperable with fortran, or at least use the same restrictions.
This push_off function doesn't make sense for me. Now interrupts are disabled every single time when this function is executed, even when are already disabled. In my opinion intr_off() should be placed inside the if(mycpu()->noff ==0).
When dealing with interrupts, you need the on or off operation to be atomic, which is why it's always just a single off no matter what
26:55 why names are so bad, the early compilers like C89 which this is using only guarantee unique names at something like 6 characters for functions, so that's why we have strlen instead of string_length.
Great video, however would it be possible to break the youtube video down somewhat into chapters, with longer videos like this it can be harder to switch in and out of watching it
One question about atomic operations: is there a way that the system prevents two CPU cores performing the swap instruction at the exact same moment? If two cores happened to run the swap at the same moment could they not both believe that they acquired the lock? Or is it built into the hardware that only one atomic operation can be possibly perfromed on any one cpu at any one time? Really enjoying the series, by the way!
Yes, exactly what you're getting at: the CPU forces synchronous access for atomic operations, so barring hardware level bugs, the value seen by an atomic swap will always be consistent across cores.