Thanks! After 3 days debugging my program, your explanation captures beautifully how the destructive select() method works beyond beej and my class lecture combined. Submitted the assignment with 100/100 on 3-way handshake
A note that Jacob didn't address: select() is slow. It takes linear time with the amount of sockets to monitor. That's a limitation of the kernel, so there's nothing we can do about it. That's why there are more modern replacements (epoll, kqueues) which are much more efficient, in addition to offer more events than select does.
@Demitri Swan well, no he didn't touch about select() performance problem. Jacob merely said that the number of file descriptors increase, not that the function call to select() (the syscall) is slow to run in kernel mode, which is where performance are hit. A larger number of fd in userspace is a small problem because it's manageable.
@Demitri Swan you're confused. The problem with select is twofold: 1) the biggest problem is the linear time and resources it takes in the kernel 2) the smaller one is the array that needs to be iterated. The 1st one is where the crux of the slowdown lies, the second one is minor, especially because it can be somewhat circumvented in software (likeJacob did) Jacob only speaks about the second one, not the first one.
@@unperrier5998 You're arguing for no reason. One is a consequence of the other. The problem you're describing is specifically mentioned in the video; he mentions it, but doesn't go into detail about the exact consequences. Maybe, I don't know, just maybe, this might have something to do with the fact that this is a video targeted at people who are learning to program with C, rather than those who already know everything.
@@eivind261093 No they are two separate problems because they act on different data: list on FD in the kernel, and another list of FD in userspace. They;re both the consequence of having too many files. But they're not linked in any way. The proof is that in userspace you can get away with the heuristics (hack) that Jacob implemented... but in the kernel you can't do anything about the O(N) behaviour.
Hi Jacob I think there is an error. The max_socket_so_far in "select" function should be if(select(max_socket_so_far + 1, &ready_sockets, NULL, NULL, NULL) < 0) . I'm currently implementing a voice assistant application using PI (school project). Thanks for all your efforts you saved me lot of time for handling multiple clients 👍👍👍.
Thanks for the video really helped me understand how to use select. I think you made a small error though: select(max_socket_so_far, ...); -> select(max_socket_so_far +1, ...); (If you have set two file descriptors "4" and "17", nfds should not be "2", but rather "17 + 1" or "18".) ~man 2 select
I have a humble request for you, it will be really awesome if we can have a Playlist from beginner to advance level or a paid course in C. Please create a course on C 🙏🙏
Hey, surprised to see you using select() instead of poll() and not even mentioning poll() and its advantages over select() As you can see select is not very scalable (and not only because of the fd limit), also, its API is a bit cumbersome. It's true that select is widely implemented, but nowadays you'll be able to use poll() on pretty much all machines as well, and if it's a Linux machine then chances are you can use improved versions of poll. Overall poll gets the same job done better and easier. Many argue that select should not be used at all as it's outdated. Descriptor multiplexers aside, I'd like to thank you for your multithreading and socket videos, they helped me tremendously while I was building a server in C for my uni project. The project was such a success I was asked to hold a lecture and talk about it
Thanks! Glad you're finding the videos helpful. Yeah, poll/kqueue/epoll are all on the docket for future videos. I just started with select, since it's the old standard. More to come.
I like the fact you are working without APIs and I'm personally interested in some techniques that could handle more than the FDSET max size. while being safe from ddos attacks.
@@JacobSorber I wouldn't mind seeing how to setup a WSS Connection from scratch. Everyone shows how to do things with API's, I like to get low level and I can't seem to find much on the topic.
I would love to see this also. The major problem I get while using poll/epoll is that example we can find constantly have only predefined MAX_EVENTS or whatever they call. Which is the max connected clients at the same time. Jacob, if you read this and have time to do so, would it be possible to make it with variable max clients connection ? Thanks anyway if you make a new video about network programming.
Thanks! Ever considered showing how a dumb/small/simple(I know the useful ones aren't) database back-end works in C? I'd really love to get a feel for what happens "behind the scenes" of SQL-calls and all that. You know, without going through at least a quarter of a million lines of source code...
I'm not 100% sure what you're asking. Are you asking for a dumb/small/simple database implementation (like with an SQL interpreter)? Or are you looking for a web-app back-end written in C that accesses a database?
@@jace4002 I'll think about it. It wouldn't be a million lines of code, but that would still be a SERIOUS tutorial. Maybe if we used a subset of the language, and made it into a multi-video series, including a part about parser generators (like YACC)...I'll see what I can do. :)
Hello there, Jacob! One simple question: what if two clients were to send a message at the very same time? Is select() invulnerable for this kind of problem? Because threads are not. Or are they? Kind regards!
The description is a bit ambiguous - do you mean two separate connections to the clients (as in TCP or any other connection-oriented protocol), or a connection-less sending to the same socket (as in UDP)? Case 1: two connections means two separate file descriptors - they both will be marked as readable, and after doing select()/poll() you choose which order you go to read them - most likely you'd loop through all descriptors from 0 to N and check which ones are readable so the lower FD number probably gets it first. Case 2 (two datagrams/packets arriving at the same connection-less socket): the messages will be placed in the kernel's internal queue leading to the file descriptor and it will become readable. You may choose to read only one packet and do select() again - it will instantly return with the same FD marked readable and you can read it again to get the second packet. OR: you can read the first packet and then attempt to read the same FD again and again to read all the packets accumulated in the queue until your recv() returns -1 and errno=EAGAIN which means no more data is available. Then you can return to your main event loop and try select() again. BTW same goes for the stream sockets (TCP) - your first read might return only a part of the available data, so you can try reading again to get the rest. Whatever order the packets will be placed in the queue - that's the task of the kernel, it has it's own threads and does proper synchronization between them so the data in the same queue won't get garbled if something truly arrives at the same time.
@@dorquemadagaming3938 nice explanation. I got a weird scenario in where the TCP socket is reported "read ready" by select() though the data is read already. I meant select reports the socket ready multiple time in a loop even if it is read already. This behaviour is rectified by setting the socket flag as O_NONBLOCKING using fcntl(). Do u have any ideas to spare?
@@kathiravankathir3089 One possibility, and a very frequent one: the peer has sent some data and then shut down the connection on its side - i.e. it has nothing more to say. So, after reading all the data on your side, the socket will become readable again, but if you try to read, you receive 0 bytes - that means EOF, no more data will come. OS separates these events, to make it possible to distinguish them for the usermode application. What do your subsequent reads return as result and errno? If you don't close the connection it will keep signaling you EOF on the file descriptor.
By the way, shutting down the connection is not the same as closing it, as it can be done unidirectionally. Read on shutdown() vs. close() functions. This is a typical behavior of the one-request HTTP clients (e.g. REST-clients) - they send one request and then do shutdown(fd, SHUT_WR) on their socket - this way the server receives an EOF notification - no more requests expected, but it still can send the reply, because the server-to-client direction is still open.
Hi Jacob , Hope you doing great. Do you have any suggestions for understanding Multithreading and its synchronisation issues. As I tried mutex with conditional variable for signalling but now I am stuck with scheduling problem. Please do suggest some link or book for better understanding of threads ,mutex and semaphores.
Hello Jacob, I have a question outside of the video topic. In your past videos you posted an email to send you code for you to review and comment on. Is that offer still available, I would love to hear your comments.
Is it possible to use select() with UDP socket? I am trying to use one server to listen to 2 different ports - i.e. 2 sockets with the same IP but different ports. I have UDP client sockets sending messages to each of the ports but I can't seem to read anything from both ports from my UDP server socket, but I can read from a single port when I removed the code for the other.
some errors: first param of select "The number of socket descriptors to be checked", so select(max_socket_so_far + 1...) and in for loop: for(int i =0; i
Besides portability, especially because I intend to run my server socket in a Linux server, what would be a good reason to use "select"? I ask because I'm looking at "poll" and it's simpler to use.
poll and select are almost identical (just slightly different interfaces). In fact, in some systems, poll is just a wrapper around select. I started with select because it's the old standard, and I hope to address the newer alternatives as I can get to it.
In my experience, in newer systems, select is a wrapper around poll. This is also true in Windows. This is done for the obvious reason of preserving backward compatibility while improving performance. The original implementation of select is too old to still be included in modern libc flavours anyway.
When do you close the serverside listening file descriptor, this program is currently leaking resources on second thought, you'd most likely close the listening file descriptor after the server is finished servicing the client..
Thanks! After 3 days debugging my program, your explanation captures beautifully how the destructive select() method works beyond beej and my class lecture combined. Submitted the assignment with 100/100 on 3-way handshake
Where did you go to school? My school also teaches with beej’s books
A note that Jacob didn't address: select() is slow. It takes linear time with the amount of sockets to monitor. That's a limitation of the kernel, so there's nothing we can do about it.
That's why there are more modern replacements (epoll, kqueues) which are much more efficient, in addition to offer more events than select does.
@Demitri Swan well, no he didn't touch about select() performance problem. Jacob merely said that the number of file descriptors increase, not that the function call to select() (the syscall) is slow to run in kernel mode, which is where performance are hit.
A larger number of fd in userspace is a small problem because it's manageable.
@Demitri Swan you're confused.
The problem with select is twofold:
1) the biggest problem is the linear time and resources it takes in the kernel
2) the smaller one is the array that needs to be iterated.
The 1st one is where the crux of the slowdown lies, the second one is minor, especially because it can be somewhat circumvented in software (likeJacob did)
Jacob only speaks about the second one, not the first one.
@Demitri Swan No offense buddy, but you're the one who argued with me without having enough knowledge. Good luck.
@@unperrier5998 You're arguing for no reason. One is a consequence of the other. The problem you're describing is specifically mentioned in the video; he mentions it, but doesn't go into detail about the exact consequences. Maybe, I don't know, just maybe, this might have something to do with the fact that this is a video targeted at people who are learning to program with C, rather than those who already know everything.
@@eivind261093 No they are two separate problems because they act on different data: list on FD in the kernel, and another list of FD in userspace.
They;re both the consequence of having too many files. But they're not linked in any way. The proof is that in userspace you can get away with the heuristics (hack) that Jacob implemented... but in the kernel you can't do anything about the O(N) behaviour.
Nice! Beautifully explained. Please continue on select() to handle multiple clients and how to read() multiple times.
this is in the realm of personal lessons. kids... u better pray you have a teacher this good. u sir are awesum.
Thanks, Axl.
thank you for this, was able to use it to monitor serial ports and not hang on a read function.
who could have possibly disliked this gem?
I have made the switch to linux so now i am watching all of your videos again
That what they're there for. I hope the switch goes well.
I really like your videos. They are short and informative. You get right to the point.
Pakistannn
Hi Jacob
I think there is an error. The max_socket_so_far in "select" function should be if(select(max_socket_so_far + 1, &ready_sockets, NULL, NULL, NULL) < 0) .
I'm currently implementing a voice assistant application using PI (school project).
Thanks for all your efforts you saved me lot of time for handling multiple clients 👍👍👍.
@You Tube admit this ratio
Thanks for the video really helped me understand how to use select.
I think you made a small error though: select(max_socket_so_far, ...); -> select(max_socket_so_far +1, ...);
(If you have set two file descriptors "4" and "17", nfds should not be "2", but rather "17 + 1" or "18".) ~man 2 select
Pakistannn
Love watching your videos because you're an expert on everything you talk about
I have a humble request for you, it will be really awesome if we can have a Playlist from beginner to advance level or a paid course in C. Please create a course on C 🙏🙏
Hey, surprised to see you using select() instead of poll() and not even mentioning poll() and its advantages over select()
As you can see select is not very scalable (and not only because of the fd limit), also, its API is a bit cumbersome. It's true that select is widely implemented, but nowadays you'll be able to use poll() on pretty much all machines as well, and if it's a Linux machine then chances are you can use improved versions of poll. Overall poll gets the same job done better and easier. Many argue that select should not be used at all as it's outdated.
Descriptor multiplexers aside, I'd like to thank you for your multithreading and socket videos, they helped me tremendously while I was building a server in C for my uni project. The project was such a success I was asked to hold a lecture and talk about it
Thanks! Glad you're finding the videos helpful. Yeah, poll/kqueue/epoll are all on the docket for future videos. I just started with select, since it's the old standard. More to come.
@@JacobSorber Thanks, really looking forward for epoll tutorial.
@@JacobSorberNote that poll() has been standardized by POSIX since 2008 but epoll() and kqueue() are still not standardized by POSIX to this day
Love this tutorial, Would be nice to get an update on this video, you did day you would do improvements in future videos :D
I'd like to see the future videos you mentioned making in this one.
Any chance on more networking videos? I really like seeing these type of videos. Not discussed as often as other C/C++ topics.
Yeah, there's definitely more than a chance. Any specific topics or questions you'd like to see addressed?
I like the fact you are working without APIs and I'm personally interested in some techniques that could handle more than the FDSET max size. while being safe from ddos attacks.
@@JacobSorber I wouldn't mind seeing how to setup a WSS Connection from scratch. Everyone shows how to do things with API's, I like to get low level and I can't seem to find much on the topic.
Love watching your videos... However could you dig deeper on the functionality differences between poll() and select().
Hey, Sieber, Could you please create video about "static" in C, importance, usage... That is very helpful for beginners.
ua-cam.com/video/3E-r4GfvWOI/v-deo.html
Who's Sieber?
Very nicely explained.
Would love to see the continuation of this. Perhaps using epoll??
I would love to see this also. The major problem I get while using poll/epoll is that example we can find constantly have only predefined MAX_EVENTS or whatever they call. Which is the max connected clients at the same time.
Jacob, if you read this and have time to do so, would it be possible to make it with variable max clients connection ?
Thanks anyway if you make a new video about network programming.
Great video, thank you...do you have any videos on epoll , (configured for edge triggered operation)? Cheers!
very instructive and precise, thank you !
You are welcome!
Your videos are really good!! 👍 when can I see the improvements on this server( I mean event driven programming and asynchronous I/O ? 🙂)
Looking forward to next video
Pajistann
hey great! I wanted to read about select(2)/kqeue(2) but just lazy to go through docs. Work made simple and fast!
Glad I could help.
Pakistannnn
Thanks!
Ever considered showing how a dumb/small/simple(I know the useful ones aren't) database back-end works in C? I'd really love to get a feel for what happens "behind the scenes" of SQL-calls and all that. You know, without going through at least a quarter of a million lines of source code...
I'm not 100% sure what you're asking. Are you asking for a dumb/small/simple database implementation (like with an SQL interpreter)? Or are you looking for a web-app back-end written in C that accesses a database?
@@JacobSorber The first one. Exactly as you described.
@@jace4002 I'll think about it. It wouldn't be a million lines of code, but that would still be a SERIOUS tutorial. Maybe if we used a subset of the language, and made it into a multi-video series, including a part about parser generators (like YACC)...I'll see what I can do. :)
@@JacobSorber That's all I can ask. Thanks!
Hello there, Jacob!
One simple question: what if two clients were to send a message at the very same time?
Is select() invulnerable for this kind of problem? Because threads are not. Or are they?
Kind regards!
The description is a bit ambiguous - do you mean two separate connections to the clients (as in TCP or any other connection-oriented protocol), or a connection-less sending to the same socket (as in UDP)?
Case 1: two connections means two separate file descriptors - they both will be marked as readable, and after doing select()/poll() you choose which order you go to read them - most likely you'd loop through all descriptors from 0 to N and check which ones are readable so the lower FD number probably gets it first.
Case 2 (two datagrams/packets arriving at the same connection-less socket): the messages will be placed in the kernel's internal queue leading to the file descriptor and it will become readable. You may choose to read only one packet and do select() again - it will instantly return with the same FD marked readable and you can read it again to get the second packet. OR: you can read the first packet and then attempt to read the same FD again and again to read all the packets accumulated in the queue until your recv() returns -1 and errno=EAGAIN which means no more data is available. Then you can return to your main event loop and try select() again. BTW same goes for the stream sockets (TCP) - your first read might return only a part of the available data, so you can try reading again to get the rest.
Whatever order the packets will be placed in the queue - that's the task of the kernel, it has it's own threads and does proper synchronization between them so the data in the same queue won't get garbled if something truly arrives at the same time.
@@dorquemadagaming3938 nice explanation. I got a weird scenario in where the TCP socket is reported "read ready" by select() though the data is read already. I meant select reports the socket ready multiple time in a loop even if it is read already. This behaviour is rectified by setting the socket flag as O_NONBLOCKING using fcntl(). Do u have any ideas to spare?
@@kathiravankathir3089 One possibility, and a very frequent one: the peer has sent some data and then shut down the connection on its side - i.e. it has nothing more to say. So, after reading all the data on your side, the socket will become readable again, but if you try to read, you receive 0 bytes - that means EOF, no more data will come. OS separates these events, to make it possible to distinguish them for the usermode application. What do your subsequent reads return as result and errno? If you don't close the connection it will keep signaling you EOF on the file descriptor.
By the way, shutting down the connection is not the same as closing it, as it can be done unidirectionally. Read on shutdown() vs. close() functions. This is a typical behavior of the one-request HTTP clients (e.g. REST-clients) - they send one request and then do shutdown(fd, SHUT_WR) on their socket - this way the server receives an EOF notification - no more requests expected, but it still can send the reply, because the server-to-client direction is still open.
Pakistannn
Hi, could you make a video about some newer apis like epoll and io_uring? Thanks.
Hi Jacob ,
Hope you doing great.
Do you have any suggestions for understanding Multithreading and its synchronisation issues.
As I tried mutex with conditional variable for signalling but now I am stuck with scheduling problem.
Please do suggest some link or book for better understanding of threads ,mutex and semaphores.
Jacob has the face of a happy little boy
Hello Jacob, I have a question outside of the video topic. In your past videos you posted an email to send you code for you to review and comment on. Is that offer still available, I would love to hear your comments.
@Jacob Sorber I have sent an email now.
Great video !
Thanks.
Is it possible to use select() with UDP socket? I am trying to use one server to listen to 2 different ports - i.e. 2 sockets with the same IP but different ports. I have UDP client sockets sending messages to each of the ports but I can't seem to read anything from both ports from my UDP server socket, but I can read from a single port when I removed the code for the other.
some errors: first param of select "The number of socket descriptors to be checked", so select(max_socket_so_far + 1...) and in for loop: for(int i =0; i
Thank you for walking me through my networking class haha
You're welcome. Glad I could help.
Pakistann
The intro music woke up some people in my house.
I'm assuming the future "improvements" will make use of poll instead since it sounds like select just uses that under the hood
0:56 Is it really necessary to test both if A is zero and not zero?
Besides portability, especially because I intend to run my server socket in a Linux server, what would be a good reason to use "select"? I ask because I'm looking at "poll" and it's simpler to use.
poll and select are almost identical (just slightly different interfaces). In fact, in some systems, poll is just a wrapper around select. I started with select because it's the old standard, and I hope to address the newer alternatives as I can get to it.
In my experience, in newer systems, select is a wrapper around poll. This is also true in Windows. This is done for the obvious reason of preserving backward compatibility while improving performance.
The original implementation of select is too old to still be included in modern libc flavours anyway.
Pakistann
Can you please create a video explanation on D-BUS in LINUX
I'll add it to the list and see what I can do.
please share the source code link too in description section..
pretty awesome :D thanks
Is there any way to find destination port number of incoming UDP packet from a socket ?
Thanks.
Yeah.
Using the getsock() syscall.
Check out its man page for details.
MOAR!!!
Make a video on poll and epoll :)
shoutout cse130!
Where at? UCSD?
Jacob Sorber no at ucsc, we’re making a load balancer
I cannot get this example working. It keeps seg faulting before main even starts.
couldn't we store our clients fds into a dynamic array and iterate over that instead of FD_SETSIZE?
When do you close the serverside listening file descriptor, this program is currently leaking resources
on second thought, you'd most likely close the listening file descriptor after the server is finished servicing the client..
Pakistannn
What files did you #include in the header?
Did you get the answer for this??
help me clarify something .... is his shirt blue or green ? (don't tell me turquoise)
Green.
is epoll better ?
CSC209 A4 GANG
Welcome. Toronto?
@@JacobSorber yes sir!
hmmm... hmm hmm hmm hmmmm... hmm hmm hmm hmmmm
Pakistann
Why would you charge money on Patreon for a simple college class socket C program ?