so many levels of abstraction,by the time people are clicking on their GUI's it is a symphony of perfect timed and executed processes, but listening to this I can really imagine year after year problems develop and more complicated solutions come into play stepping up a level of abstraction, I mean 60 years is so impressive to see how for we come, from logic gates, MOSFETS to EEPROMS to insane clock speeds to RISC to now but it all started at a level of someone feeding an electronic impulse into a JK FLIP FLOP and trapping that high or low impulse.... it is truly baffling!!!!
I have loved every second of your videos. Specially the Unix system call series. Could you please kindly do an advanced series of Linux Internals. I know it is too much to ask but obviously you are the right person to do it. I have never seen anyone else describing things so clear and nicely. Thanks a million.
i studied this in my engineering class and it took them almost 4 months to teach us this ...this 45 minute lecture made it so easy and simple !!! in uni they stretched it so much that you forget about it start questioning everything again every lecture thx for the upload
At 31:50 you talk about environment variables. However there are some mistakes worth correcting for future viewers. First, although the environment variables are stored in the process' memory, it is stored as zero-terminated strings and not as one big string separated by new-line characters. It is also is not stored on the heap, nor is there a global variable in the data section pointing to it. The environment is actually stored entirely on the stack and is a part of the initial process stack that is set up before the program starts running. The first value on the stack is the argument count followed by an array of the addresses of the different arguments, then address 0 marking the end of the argument array. Right after that there is a second array of addresses which each point to a zero-terminated string which would be the environment variables, this array is also terminated by having address 0 at the end. There is actually a third array of auxiliary vectors but after that there is an unspecified amount of bytes before the information block starts. It's generally inside this block the command line arguments and environment variables are stored, as in the actual string values. You can confirm this by dumping the stack of pretty much any program and you typically find all the environment variables at the very end (highest memory address). If you are on Linux you can do this by first reading the '/proc//maps' file for any process, just replace with that process' PID. This file contains the ranges of memory mapped to the process and what they are mapped to. Near the bottom you'll see one line with the range mapped to [stack]. Take note of the start address and calculate how big it is in bytes. Then run 'sudo xxd -s -l /dev//mem', example 'sudo xxd -s 0x7fff182bd000 -l 0x22000 /dev/14950/mem'. And the environment variables should get printed out together with their hex values and address location. To illustrate this further I've written a small c program that prints all the environment variables using the argv array pointer. As you can see the environment variable pointers are stored pretty much right after argv. #include int main(int argc, char **argv) { for (int i = argc + 2; argv[i] != NULL; i++) { printf("%s ", argv[i]); } return 0; } You can of course make it less stupid by using the full version of main which includes a pointer to the first element in the environment pointer array. #include int main(int argc, char **argv, char **envp) { for (int i = 0; envp[i] != NULL; i++) { printf("%s ", envp[i]); } return 0; } This is all defined as a part of the ABI (application binary interface) for both the x86 and x86_64 architecture, so 32 and 64 bit desktop computers. tl;dr: The environment is not a single long string separated by new-line characters. The environment variables and the pointers to them are both stored on the stack or just before it.
While not part of the kernel ABI, as you point out, glibc provides a global variable (char ** environ) pointing to the environment, and may relocate the environment to the heap in setenv() as needed.
I've been watching this video series every day, for the last 3 days, and I learn a little more from it each time, lol. Thanks man. This is one of the most professional lessons I've ever found on UA-cam. I can tell you know what you're talking about.
Applications are ever more security aware, and one caveat with malloc(3) and mmap(2) is to determine what is sensitive in what has been allocated (e.g., storage for passphrases, cached encryption keys, etc.), and that should be zeroed before calls to free(3) or munmap(2). There may be no guarantee by the OS that newly malloc'ed or mmapped regions have been thus scrubbed, so it's up to the process, as best as it can, to sanitize such regions before handing them back.
***** seems like there should be a system call to handle sensitive data. I wonder if it would be possible to somehow fill the memory with memory requests to just scan for random strings. In what context, though, do you mean apps are more security aware? Are you speaking of just this?
One should always consult the manual page for the system call you want to use, and about the system for which you wish to program. This guy obviously had to remain generic to cover SysV, Linux, *BSD, OS X, etc., but each system can have other restrictions or features. For example, if you want to write a utility for an SELinux system, you will have contexts to deal with, and such things operate additionally in system calls (e.g., I think child processes after a fork(2) also inherit SELinux contexts).
_I wonder if it would be possible to somehow fill the memory with memory requests to just scan for random strings. In what context, though, do you mean apps are more security aware?_ Exactly that, Adam Outler . I just wanted people to start thinking more securely if they want to program at the system call level. It's worth a look at a particular OS's manpages or equivalent to see if such conditions are specified, such as [s]brk(2) (upon which malloc(3) is based) zeroing memory pages before they're returned to the process. In fact, because a process can be killed at any time, it is wise in more security minded apps to scrub storage (whether variables/RAM or parts of files) as soon as they're not needed. To a certain extent, you can control this by locking the pages into RAM if you have enough priviledge to the process, so that such RAM will never be written to the swap partition. That represents another potential security threat, the superuser (or anyone with enough access to the underlying device node) sifting through the swap space for such nuggets.
It's a fantastic tutorial! I've been baffled with kernel space and user space for quite a long time and misunderstood that system call incurs context switch between user process and kernel process, until I watched this video... million thanks!
This is a truly excellent, informative and well laid out video. Thank you so much. I've been coding for 15+ years and got a lot out of this, especially having mostly dealt in interpreted dynamic languages and not having to ever manage memory myself
I'm no expert issue on this issue, but my investigation at the time concluded that brk/sbrk are actually archaic, as the concept of a data segment barrier is outmoded in paged-memory environments. Yes, mapping /dev/mem is not the way to go, but mmap in modern Unixes can do 'annonymous mapping,' which maps to swap-backed memory pages rather than any file. The Wikipedia entry on mmap mentions this. I believe this is what most allocation routines use today, not brk/sbrk.
I just discovered your channel and can't stop watching your videos! They're incredibly helpful and clear. Just wanted to say it seems to me this and the next video should be added to your Operating Systems playlist.
Yeah, I should have phrased this better. To my understanding, user groups were created solely with actual groupings of humans in mind, a use case of diminished importance today in most settings.
Using this to study up for my interview as a production engineer. Best videos resource I’ve found besides certain books. Thank you. Maybe can you come up with a practice problem series?
When I first learned about permissions on directories, it was said that 'x' allows one to cd into it, even if that might not be the most precise explanation, I think it's a good enough proxy to give users the intuition.
Just stating the obvious, 90% of why i'm watching this particular video on the plane is "speaker's voice does not make me ashamed to be an engineer", good job
The depth of the information you cover is what scared me away from Computer Science. I call myself a programmer, but obviously I am more a Code Groupie. Your video is very nice, and still relevant so many years after it was made. I wonder if ever architecture will significantly change?
Brian, I’m learning to program on Research Unix V6 in C, so I love your tutorial. I have a small nitpick. Plurals of nouns which end in “s” are pronounced “es” like any other noun ending, unlike (I think) the only exception, which is those which end in “x” which have a different ending containing the pronounciation you are using. Seems to be a recent US thing, and particularly in IT. Love the tutorials, thanks!
Technically, mmap maps memory at a specified address, while malloc finds free address space and allocates there. Of course, you know this, but I'll just note the correct usage in the comments.
mmap is how a userspace program requests memory from the kernel (with an anonymous mapping, and it can, but does not have to specify the address it gets mapped at, it can leave it up to the kernel to choose). `malloc` is not a system call, but a libc function that will use `mmap` under the hood.
Aaron Miller malloc can be implemented using an anonymous mmap, but it can also be implemented with sbrk. I do see your point, but simplifying mmap as just “the memory allocator” is a bit of a misrepresentation.
Thanks, this is good to know. Still think it's a bit too in depth at this point. I'm already glossing over a lot here, though I don't think I say anything out-and-out false. Do you think there's something misleading?
this is great, and i'm all for the smaller chunks however, I feel this video has some volume normalization issues.. That could be something to double check before creating your final version. :D Still Very Awesome
This is a superb playlist! Though this is 11 yrs old, can you please tell where can I find the subsequent videos cauz the link provided in description is out-dated i think ... Please tell where can I get the subsequent videos of this series or upload them in the same playlist on youtube. This is a very kind request of mine ... Thank You though for whatever you have put on the channel for free !!!
Interesting to note: I have found the trying to access a deallocated C++ object crashes with a seg-fault, but accessing a deallocated C-string with C code results in code that runs with no complaints. (Why I did? Well, it was a challenge experiment, not something I'd ever do in a real program.)
I assume this C string was a literal in code, right? String-literal strings get put in a permanent memory area that exists at program start and never goes away. I'm curious what platform you did this on? I believe the behavior of free'ing addresses that aren't valid alloc addresses is undefined. Perhaps on your platform invalid free calls just fail silently.
Hello Sir, thank you for your knowledge, all the videos are really great and understandable, Sir I wanted to know name of the book from where I can get this information, thank you
Excellent info , though i had to reduce to 0.75x speed . Thank you for the explanation 👍 can u give info on relation between smaps and mmap and about virtual memory and resident set size
9 years later and this is the best resource on the topic. Really great.
Very well made, you made the world a better place.
so many levels of abstraction,by the time people are clicking on their GUI's it is a symphony of perfect timed and executed processes, but listening to this I can really imagine year after year problems develop and more complicated solutions come into play stepping up a level of abstraction, I mean 60 years is so impressive to see how for we come, from logic gates, MOSFETS to EEPROMS to insane clock speeds to RISC to now but it all started at a level of someone feeding an electronic impulse into a JK FLIP FLOP and trapping that high or low impulse.... it is truly baffling!!!!
it's mind blowing but also magical to see how all comes down to some tiny switches
High quality presentation and commentary. We get such a long, interesting and informative video for the great low price of free. Thank you so much.
I have loved every second of your videos. Specially the Unix system call series. Could you please kindly do an advanced series of Linux Internals. I know it is too much to ask but obviously you are the right person to do it. I have never seen anyone else describing things so clear and nicely. Thanks a million.
i studied this in my engineering class and it took them almost 4 months to teach us this ...this 45 minute lecture made it so easy and simple !!! in uni they stretched it so much that you forget about it start questioning everything again every lecture
thx for the upload
what Uni?
@@arrikd8358 ESPRIT in tunisia
My head is spinning, this is the most meaty video I've ever watched, you are an unsung hero
At 31:50 you talk about environment variables. However there are some mistakes worth correcting for future viewers. First, although the environment variables are stored in the process' memory, it is stored as zero-terminated strings and not as one big string separated by new-line characters. It is also is not stored on the heap, nor is there a global variable in the data section pointing to it. The environment is actually stored entirely on the stack and is a part of the initial process stack that is set up before the program starts running. The first value on the stack is the argument count followed by an array of the addresses of the different arguments, then address 0 marking the end of the argument array. Right after that there is a second array of addresses which each point to a zero-terminated string which would be the environment variables, this array is also terminated by having address 0 at the end. There is actually a third array of auxiliary vectors but after that there is an unspecified amount of bytes before the information block starts. It's generally inside this block the command line arguments and environment variables are stored, as in the actual string values. You can confirm this by dumping the stack of pretty much any program and you typically find all the environment variables at the very end (highest memory address). If you are on Linux you can do this by first reading the '/proc//maps' file for any process, just replace with that process' PID. This file contains the ranges of memory mapped to the process and what they are mapped to. Near the bottom you'll see one line with the range mapped to [stack]. Take note of the start address and calculate how big it is in bytes. Then run 'sudo xxd -s -l /dev//mem', example 'sudo xxd -s 0x7fff182bd000 -l 0x22000 /dev/14950/mem'. And the environment variables should get printed out together with their hex values and address location.
To illustrate this further I've written a small c program that prints all the environment variables using the argv array pointer. As you can see the environment variable pointers are stored pretty much right after argv.
#include
int main(int argc, char **argv)
{
for (int i = argc + 2; argv[i] != NULL; i++)
{
printf("%s
", argv[i]);
}
return 0;
}
You can of course make it less stupid by using the full version of main which includes a pointer to the first element in the environment pointer array.
#include
int main(int argc, char **argv, char **envp)
{
for (int i = 0; envp[i] != NULL; i++)
{
printf("%s
", envp[i]);
}
return 0;
}
This is all defined as a part of the ABI (application binary interface) for both the x86 and x86_64 architecture, so 32 and 64 bit desktop computers.
tl;dr: The environment is not a single long string separated by new-line characters. The environment variables and the pointers to them are both stored on the stack or just before it.
Wow man! Thank you :) Both the video and your comment. Amazing stuff!
Thanks. This comment should be pinned!
Thank you sir just tried your code in online c ide and it runs exactly as you described, but other than than video is great!
Incredibly informative. Thank you for commenting this.
While not part of the kernel ABI, as you point out, glibc provides a global variable (char ** environ) pointing to the environment, and may relocate the environment to the heap in setenv() as needed.
I've been watching this video series every day, for the last 3 days, and I learn a little more from it each time, lol. Thanks man. This is one of the most professional lessons I've ever found on UA-cam. I can tell you know what you're talking about.
The best explanation of system calls I could find on the internet.
Thanks to Brian Will
This is a very informative Linux/Unix System Calls series.
Applications are ever more security aware, and one caveat with malloc(3) and mmap(2) is to determine what is sensitive in what has been allocated (e.g., storage for passphrases, cached encryption keys, etc.), and that should be zeroed before calls to free(3) or munmap(2). There may be no guarantee by the OS that newly malloc'ed or mmapped regions have been thus scrubbed, so it's up to the process, as best as it can, to sanitize such regions before handing them back.
***** seems like there should be a system call to handle sensitive data. I wonder if it would be possible to somehow fill the memory with memory requests to just scan for random strings. In what context, though, do you mean apps are more security aware? Are you speaking of just this?
One should always consult the manual page for the system call you want to use, and about the system for which you wish to program. This guy obviously had to remain generic to cover SysV, Linux, *BSD, OS X, etc., but each system can have other restrictions or features. For example, if you want to write a utility for an SELinux system, you will have contexts to deal with, and such things operate additionally in system calls (e.g., I think child processes after a fork(2) also inherit SELinux contexts).
_I wonder if it would be possible to somehow fill the memory with memory requests to just scan for random strings. In what context, though, do you mean apps are more security aware?_
Exactly that, Adam Outler . I just wanted people to start thinking more securely if they want to program at the system call level. It's worth a look at a particular OS's manpages or equivalent to see if such conditions are specified, such as [s]brk(2) (upon which malloc(3) is based) zeroing memory pages before they're returned to the process.
In fact, because a process can be killed at any time, it is wise in more security minded apps to scrub storage (whether variables/RAM or parts of files) as soon as they're not needed. To a certain extent, you can control this by locking the pages into RAM if you have enough priviledge to the process, so that such RAM will never be written to the swap partition. That represents another potential security threat, the superuser (or anyone with enough access to the underlying device node) sifting through the swap space for such nuggets.
It's a fantastic tutorial! I've been baffled with kernel space and user space for quite a long time and misunderstood that system call incurs context switch between user process and kernel process, until I watched this video... million thanks!
This is a truly excellent, informative and well laid out video. Thank you so much. I've been coding for 15+ years and got a lot out of this, especially having mostly dealt in interpreted dynamic languages and not having to ever manage memory myself
More Unix videos please! You're an excellent teacher, and the slides are very well done. :))))))))
2:00 kinda sus abbreviation you got there
I'm no expert issue on this issue, but my investigation at the time concluded that brk/sbrk are actually archaic, as the concept of a data segment barrier is outmoded in paged-memory environments. Yes, mapping /dev/mem is not the way to go, but mmap in modern Unixes can do 'annonymous mapping,' which maps to swap-backed memory pages rather than any file. The Wikipedia entry on mmap mentions this. I believe this is what most allocation routines use today, not brk/sbrk.
I am seeing this after 9 years. Nice content
Your content is Gold. Sad to see you inactive........
I just discovered your channel and can't stop watching your videos! They're incredibly helpful and clear.
Just wanted to say it seems to me this and the next video should be added to your Operating Systems playlist.
Sir, you may be the reason I get my dream job
I’ve searched for a video like yours for a long time, thank you so much for this work!
I have a class on Operating Systems and this has been very helpful! thank you
One of the most solid videos I've seen on Linux. Great job. Thanks.
Your channel looks awesome. Thank you so much, very valuable lessons. I can't believe I have access to these!
Good video, just a small point. Linux is a kernel, Debian and others are the Unix derivatives that use the Linux kernel.
Great video Brian
Thanks for making this gem of a video. Your content is lucid and enriching at the same time
Yeah, I should have phrased this better. To my understanding, user groups were created solely with actual groupings of humans in mind, a use case of diminished importance today in most settings.
Wow, where have you been, this is a treasure!
Very informative. I needed this to better understand low level details for the program I am currently writing in C++ and llvm. Thanks
Thank you for taking the time to make this video. A thumbs up 👍
I'll be viewing more of your videos 🙂 including part 2 of this one
Using this to study up for my interview as a production engineer. Best videos resource I’ve found besides certain books. Thank you. Maybe can you come up with a practice problem series?
Came here through codeschool.org while googling for a system call and i'm really enjoying the rest of the content.
When I first learned about permissions on directories, it was said that 'x' allows one to cd into it, even if that might not be the most precise explanation, I think it's a good enough proxy to give users the intuition.
0:00 ~ UNIX-like systems
1:46 ~ UNIX standards
2:57 ~ System calls
5:21 ~ Process states
Just stating the obvious, 90% of why i'm watching this particular video on the plane is "speaker's voice does not make me ashamed to be an engineer", good job
BSD actually stands for Berkley Software Distribution
The depth of the information you cover is what scared me away from Computer Science. I call myself a programmer, but obviously I am more a Code Groupie. Your video is very nice, and still relevant so many years after it was made. I wonder if ever architecture will significantly change?
laughed at "we have a process that is forking itself"
you are the real Guru, thanks a lot , really appreciate your help and videos. :)
Brian, I’m learning to program on Research Unix V6 in C, so I love your tutorial. I have a small nitpick. Plurals of nouns which end in “s” are pronounced “es” like any other noun ending, unlike (I think) the only exception, which is those which end in “x” which have a different ending containing the pronounciation you are using. Seems to be a recent US thing, and particularly in IT. Love the tutorials, thanks!
It says this is one part of a larger series. What is the larger series?
Actually it's Berkeley SOFTWARE Distribution.
-Nice first pic though gotta love Jurassic Park.
Wow ! that's one helpful easy to understand lecture on UA-cam
It is an excellent video which covers hell lot of things with great clarity in a short time.Thanks for the tutorial.
How did it know 13 years ago that android/ios would be the two dominant mobile phone os's? o.o
Great Work Brian!
What an excellent and clear explanation. thanks a lot for the upload!
BIll - great work here. very helpful for me in understanding issues I'm dealing with on some servers at work. thanks
37:14, that is why UNIX are secure and safe
Thanks a ton. This is very nice video on system calls and process address space.
great content with awesome sidenotes to give you the big picture - thank you!
System call:
Generate luminous element!
Discharge!
i gEt tHaT RefeREnCe
While watching SAO i kept thinking how insecure the system was.
Technically, mmap maps memory at a specified address, while malloc finds free address space and allocates there. Of course, you know this, but I'll just note the correct usage in the comments.
mmap is how a userspace program requests memory from the kernel (with an anonymous mapping, and it can, but does not have to specify the address it gets mapped at, it can leave it up to the kernel to choose).
`malloc` is not a system call, but a libc function that will use `mmap` under the hood.
Aaron Miller malloc can be implemented using an anonymous mmap, but it can also be implemented with sbrk.
I do see your point, but simplifying mmap as just “the memory allocator” is a bit of a misrepresentation.
I thought malloc uses sbrk() or was is mmap a recent change? Thanks.
Garrick He depends on your libc/malloc, recent ones mostly mmap I believe, but when in doubt strace and find out :)
Thanks for this insightful playlist.
Btw: Can you recommend a good book (or lecture notes) on this topic?
Fine job, Brian!
Thanks, this is good to know. Still think it's a bit too in depth at this point. I'm already glossing over a lot here, though I don't think I say anything out-and-out false. Do you think there's something misleading?
No such thing as too in depth mate. Share as much as you know!
Thank you Sir. Great explained.
seems the site is down ?
Great content, thanks for sharing the knowledge.
Thank you Brian.
This is a very quality tutorial. Thanks a lot!
So a system call is some service that the operating system makes available from hardware and application/processes use these services to function...?
The init process be like "I just wanna fork"
this is great, and i'm all for the smaller chunks however, I feel this video has some volume normalization issues.. That could be something to double check before creating your final version. :D Still Very Awesome
💯💯💯❤❤ great playlist
great video, Thank you very much
How does this compare to modern Windows Systems? Would be interesting to see a comparison video.
This is amazing!
thank you so much for thee great explanation
what is your presentation app? these slides are perfect.
Thank you so much Brain. very useful video.
Unix system calls casually explained
This is a superb playlist! Though this is 11 yrs old, can you please tell where can I find the subsequent videos cauz the link provided in description is out-dated i think ... Please tell where can I get the subsequent videos of this series or upload them in the same playlist on youtube. This is a very kind request of mine ... Thank You though for whatever you have put on the channel for free !!!
idk if you still need it but: ua-cam.com/video/2DrjQBL5FMU/v-deo.html
good work, very straight forward.
Interesting to note: I have found the trying to access a deallocated C++ object crashes with a seg-fault, but accessing a deallocated C-string with C code results in code that runs with no complaints.
(Why I did? Well, it was a challenge experiment, not something I'd ever do in a real program.)
I assume this C string was a literal in code, right? String-literal strings get put in a permanent memory area that exists at program start and never goes away. I'm curious what platform you did this on? I believe the behavior of free'ing addresses that aren't valid alloc addresses is undefined. Perhaps on your platform invalid free calls just fail silently.
Good job man !
This is incredible, thank you.
Excellent explanation, thank you!
An amazingly helpful video on the subject. Thank you very much.
Thank you so much you helped me a lot
Where is the larger series? I'd like to watch it! Thanks
Hello Sir, thank you for your knowledge, all the videos are really great and understandable, Sir I wanted to know name of the book from where I can get this information, thank you
Allah bless you! As a self taught software monkey now I have elvolved into software homo sapiens
Excellent info , though i had to reduce to 0.75x speed . Thank you for the explanation 👍 can u give info on relation between smaps and mmap and about virtual memory and resident set size
Thanks a lot for the video.
However, I should say that the captions seem to work exceptionally well. :D
Opensource is not necessarily free software.
Ye but it is practically free if you compile it
Great Video
The part about syscall using the user stack of the process is blatantly wrong
Thank you very much
I came here because UA-cam knows I need to understand asyncio without a CS degree. This is adequate. Haha
awesome! keep up the good work
I swear, Im gonna ace this course (OS) without attending it:D
you are crazy good!
BSD -Berkeley Software Distribution
Great video!
GREAT VIDEO, THANKS FOR MAKING THIS
Teacher: You need to learn system calls
me: but who will teach me?
Teacher: Brian Will
Fantastic!