my first assumption would be the difference in workers, in case of threads there are 16 and they're not limited ( i mean by smth like Semaphores) whereas in case of processes the number of workers is equal to cpu_count() which i assume is definitely less than 16.
i hope u also explain why concurrency is faster than paralellism ...because according to what i understood in paralellism there is always a possibility of multiple tasks being performed at the same instance as dedicated cores have been assigned but in concurrency only one task is being worked upon at any instance of time
I think it could be because there are 16 threads but the parallelism example is using one thread per core which could be less. Thus the threads can fetch multiple pages first
thank you for illustration, where I thought that parallelism will be faster than concurrency but in the last example it shows the opposite , is due to selecting 16 workers in threading which could be more than number of cores, am little confused and need to check ti in details !!
Because it’s being used in an I/O bound operation, and not a CPU bound operation. Concurrency would be much, much slower if the example in the video was processing data instead of fetching data from an external source.
This is actually a great example of why these computing paradigms exist in the first place, and how they are utilized to solve different problems. Concurrency is generally faster and more efficient for “I/O bound” operations where there is an external dependency that needs to be waited on before a computation can be completed. (A server sending over some requested data such as in the video, a user input, etc.) Parallelism on the other hand is faster for “CPU bound” operations, where there is no external dependency, and all data is already locally accessible (summing up an array of a billion integers in RAM, for example). The fundamental difference is in identifying where the bottleneck lies. Concurrency is faster in the video because a single shared CPU core starts the fetch() call for the first link in the array, and then immediately context switches over to a new thread to make the second fetch() call, and so on, until all fetch() calls are made with N number of threads. The dispatched fetch() calls can resolve at any point in time during this process of making all N calls, and the shared core is free to return the result of a call once it switches back to a resolved fetch() thread. The timer stops once all fetch() calls have resolved and returned the requested data, which is almost completely dependent on the I/O of external systems and not the local CPU. The parallel processing solution was slower because the number of fetch() calls that can be started at the same time is limited by however many CPU cores are passed into the .map() method, which is hard-limited by however many physical hardware cores exist in the local system. This means that we can make at most N fetch() calls at the same time, and we need to wait for one of those calls to resolve before we can make another one, since we assign 1 core to a single fetch() call and only have N cores to use in total. Once you understand these distinctions, the results of this video shouldn’t come as a surprise!
This is untrue. If it were, there wouldn’t be any use for parallelism. Processing a large array of integers concurrently with an arbitrary number of threads is the same as processing it with a single thread, and saves zero CPU time. Whereas, processing the same array but slicing it into N pieces through parallel processing does save CPU time, resulting in a quicker computation and end result. This is why there is a distinction.
1:41 I was a little surprised by the results, I expected the approach on the right to be faster lol
maybe has something to do with Python's GIL
my first assumption would be the difference in workers, in case of threads there are 16 and they're not limited ( i mean by smth like Semaphores) whereas in case of processes the number of workers is equal to cpu_count() which i assume is definitely less than 16.
@@blanky_nap oooooooooh, thanks for your input, I could see that being the case!
@@blanky_nap very good point, so if we had multiple processes and inside of them multiple threads it would be a lot lot faster
I was also surprised after seeing the results but got the answer here
I thought I was watching the coredumper channel video
Oh
The animation is either "heavily inspired" or straight up copied from the Coredump's video.
parallelism is good for cpu bound tasks and concurrency is good for i/o bound tasks fyi
i hope u also explain why concurrency is faster than paralellism ...because according to what i understood in paralellism there is always a possibility of multiple tasks being performed at the same instance as dedicated cores have been assigned but in concurrency only one task is being worked upon at any instance of time
I think it could be because there are 16 threads but the parallelism example is using one thread per core which could be less. Thus the threads can fetch multiple pages first
thank you for illustration, where I thought that parallelism will be faster than concurrency but in the last example it shows the opposite , is due to selecting 16 workers in threading which could be more than number of cores, am little confused and need to check ti in details !!
M confused too
If concurrency means only one worker working at a time, how can it be faster than parallellism?
Because it’s being used in an I/O bound operation, and not a CPU bound operation. Concurrency would be much, much slower if the example in the video was processing data instead of fetching data from an external source.
This is actually a great example of why these computing paradigms exist in the first place, and how they are utilized to solve different problems.
Concurrency is generally faster and more efficient for “I/O bound” operations where there is an external dependency that needs to be waited on before a computation can be completed. (A server sending over some requested data such as in the video, a user input, etc.) Parallelism on the other hand is faster for “CPU bound” operations, where there is no external dependency, and all data is already locally accessible (summing up an array of a billion integers in RAM, for example). The fundamental difference is in identifying where the bottleneck lies.
Concurrency is faster in the video because a single shared CPU core starts the fetch() call for the first link in the array, and then immediately context switches over to a new thread to make the second fetch() call, and so on, until all fetch() calls are made with N number of threads. The dispatched fetch() calls can resolve at any point in time during this process of making all N calls, and the shared core is free to return the result of a call once it switches back to a resolved fetch() thread. The timer stops once all fetch() calls have resolved and returned the requested data, which is almost completely dependent on the I/O of external systems and not the local CPU.
The parallel processing solution was slower because the number of fetch() calls that can be started at the same time is limited by however many CPU cores are passed into the .map() method, which is hard-limited by however many physical hardware cores exist in the local system. This means that we can make at most N fetch() calls at the same time, and we need to wait for one of those calls to resolve before we can make another one, since we assign 1 core to a single fetch() call and only have N cores to use in total.
Once you understand these distinctions, the results of this video shouldn’t come as a surprise!
What an excellent video.
Thanks for your support
This video is a copy of Core Dumped's video ua-cam.com/video/5sw9XJokAqw/v-deo.htmlsi=JuoE2ufzXhMCNR4r
While you need multiple cores/thread execution engines to achieve parallelism, concurrent threads executing is effectively parallel execution.
This is untrue. If it were, there wouldn’t be any use for parallelism.
Processing a large array of integers concurrently with an arbitrary number of threads is the same as processing it with a single thread, and saves zero CPU time. Whereas, processing the same array but slicing it into N pieces through parallel processing does save CPU time, resulting in a quicker computation and end result.
This is why there is a distinction.
Aaaaaaah, like the stroboscopic effect!!!
This animation is copied from core dump
You should have used ProcessPoolExecutor to have the exact same code on both sides.
Coredumped copied
Cool animation
Thanks
Not his animation