The Fastest Way to Loop in Python - An Unfortunate Truth

mCoding

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 лют 2025

КОМЕНТАРІ • 2,2 тис.

@AlexPadula 3 роки тому ⁺⁷⁷¹⁷
You know, if the time difference between a while loop and a for loop matters to your application, Python might not be the most appropriate language.
@abebuckingham8198 3 роки тому ⁺³³⁷
Comparisons are expensive and iterating is cheap. That's true in assembly, C and Python.
@firstname4337 3 роки тому ⁺¹⁰²
WRONG -- he just showed that numpy works fine
@HiltownJoe 3 роки тому ⁺⁶³⁴
@@firstname4337 Yes, because if you use numpy you use C. That is the whole point of numpy, to not use python for operations that are repeated millions of times.
Alex' point stands, because in the numpy case we use Python for everything that is not time sensitive, but for the thing that is time sensitive we use a small C monster wrapped in a thin fluffy Python blanket.
@boredhero 3 роки тому ⁺⁷
@@HiltownJoe Have you heard of PyPy. I ran his source file in pypy3 and got less than a quarter of a second for all methods except sum generator and sum list comp (0.72 seconds for those)
@coreylazy 3 роки тому ⁺¹⁹²
Clearly not a protip. A professional understands the constraints of their environment and the project they're working on. The choice of programming language is not (always) up to the individual writing the application.
@nathandecena7694 3 роки тому ⁺⁵⁸³⁹
"The Fastest way to Loop in Python is not to Loop in Python" Great sentence.
@karx11erx 3 роки тому ⁺³¹
Talk about understanding the letters, but not the meaning.
@samuelhulme8347 3 роки тому ⁺⁸²
To loop,or no to loop that is the question
@nathandecena7694 3 роки тому ⁺⁹
@@samuelhulme8347 Hahaha DEEP
@mohamedfilali977 3 роки тому ⁺¹
@@samuelhulme8347 ok, nice job, now im lost ..lool
@Nerketur 3 роки тому ⁺²²
"a strange language. the only winning way is not to use it.
how about a nice game of chess?"
@daifee9174 3 роки тому ⁺⁸⁰⁴⁸
pro tip how to boost python: use C
@gutoguto0873 3 роки тому ⁺⁴⁹⁹
or C++, either choice is good.
@embeddor2230 3 роки тому ⁺³⁶⁹
C++ should be the better option here, cause of the OOP support.
@mCoding 3 роки тому ⁺¹⁴⁴⁸
Or any compiled language, really.
@Jonas_Meyer 3 роки тому ⁺⁷⁰⁶
Assembler the only real choice.
@leopoldbroom7679 3 роки тому ⁺³³⁴
@@Jonas_Meyer You're not a real programmer until you can program in assembly
@peterwilson8039 2 роки тому ⁺¹²⁴²
I'm a physicist and my experience is that when you spend the majority of your time writing the code, and running it takes only a few seconds, reducing the development time is much more important than reducing the run-time, and that's why I like Python.
@mCoding 2 роки тому ⁺²⁶¹
So true, and something many eager-optimizers need to hear! They will certainly learn from experience as they discover where they spend their time.
@yutubl 2 роки тому ⁺²¹
Hi Peter, well I've been a software developer delievering a (Basic) scriptable technical measurement data PC-application NI DIAdem, and we put most of our development effort into giving our users like you (engineers & physicists) optimized script commands, which behave similar to that of Python. So using already implemented loop command may give you better performance than own loops as described here.
@FlanPoirot 2 роки тому ⁺²⁴
that's the thing tho, they're not mutually exclusive, you can have a very easy to write and a decently performant language (look at julia)
why python is slow is not bc it can't go faster or bc it's a sacrifice for expressiveness (as there are python implementations that are way way faster but that lack total compatibility with libs)
The reason why python is slow is because it's poorly designed. It uses linked lists for lists instead of vectors (log of n vs log of 1), it compiles to an intermediate language but it leaves it at that, there's no JIT in Cpython, the global interpreter lock impedes people from writing super performant code, etc, etc
Python is slow bc of the choices they've made over the years, they're consistently made choices that affect performance, they never took the time to think about it and see how they could implement abstractions that would not take a big toll on performance, they just added things on top as they saw fit.
But that's not a big surprise once you see that python was a pet project of Guido and he just made it bc he could and people happened to pick it up and spread it (even tho it's performance and features were worse back then)
@CottidaeSEA 2 роки тому ⁺³⁸
You're using Python for simpler scripts, like what it was made for. That's a valid use case. Making large systems in Python is just stupid for the simple reason that it will be expensive to run.
@spl420 2 роки тому ⁺³
That's depends on task, but when we talking about science - it's 100% true.
@kitrodriguez992 3 роки тому ⁺⁴⁵⁰
I'm in my 3rd year in Computer Science and trust me, I've watched SO many youtubers that do this and you're by far the smallest for an unknown reason. You explain things really well, you don't beat around the bush and overall just make quality content. :D I'm now subbed and I'll be coming back for more :D
@mCoding 3 роки тому ⁺³³
Thanks so much for the kind words!
@deedewald1707 3 роки тому ⁺¹
Excellent content and testing summaries !
@krele14 3 роки тому ⁺³
Then again, he's the only one that would place "just know your result ahead of time lul" as a viable suggestion.
@FaranAiki 3 роки тому ⁺⁷⁵⁸
What I did not know about this video, is, 100_000_000.
I did not know if python has a feature to add underscore between number.
Thanks, anyway :)
@mCoding 3 роки тому ⁺³⁴⁶
I add secret lessons to all my videos!
@lawrencedoliveiro9104 3 роки тому ⁺⁵⁵
Added in Python 3.6.
@monojitchatterjee3185 3 роки тому ⁺¹⁵
Same! I didn't know this too!
@loutragetadk453 3 роки тому ⁺⁶
Same
@rogervanbommel1086 3 роки тому ⁺⁴
Same
@tomaskot9278 3 роки тому ⁺¹¹⁰⁸
Fun fact: A good C compiler would detect this common pattern (summing sequential numbers) and replace the for loop with the direct calculation of the result. And if even the number of elements was hard-coded, the compiler would even simply replace the whole code with the hard-coded numeric answer.
@Czeckie 3 роки тому ⁺³³
I always wondered what's the point of this? Are series we have summation formulas for really coming up in a real software? As far as I can tell the only application is to show it in a presentation about optimizing compilers. And frankly, this is even a wrong showcase of optimization, cause the useful techniques are general and pliable enough for compiler to reason about. This is just a hardcoded nonsense.
@vinno97 3 роки тому ⁺¹⁵⁰
@@Czeckie this exact problem isn't common. But good compilers can recognize loads of (small) patterns in a codebase that, when combined, can amount to a large speedup. Sometimes some optimized chunks can again be combined into a new, smaller and/or faster, chunk.
@joestevenson5568 3 роки тому ⁺⁶⁵
@@oj0024 Yes, if you write shitty enough code the compiler cannot fix it for you because it can’t make assumptions about what you’re trying to accomplish. This is not a revelation or even a shortcoming, its a feature.
@oj0024 3 роки тому ⁺³⁰
@Matt Murphy He is right that compilers can do it, but if we look at the two most modern compilers right now only one of them can detect it and only if you write the code in a non-obvious way.
The second part showed that compilers don't do it with constants either, they just execute the code internally. I'm not saying that compilers aren't doing great optimizations, just that the claim that "A good C compiler would detect this common pattern" is wrong.
Especially since such optimizations don't gain anything (I'm talking about the pattern transformation, not the constant evaluation). They can only be done in very few cases, and you usually assume that the programmers know what they are doing. (e.g. the only reason to implement the summation code would probably be for benchmarking purposes)
@oj0024 3 роки тому ⁺¹⁷
@@joestevenson5568 clang didn't detect the pattern when using a normal for loop, it only detected it if you write the non obvious and usually not used while(n--) loop.
@julianmahler2388 3 роки тому ⁺²¹⁰
5:08 It's not about NumPy being primarily written in C (so is CPython), but about Python' dynamic vs. NumPy's static typing. In a dynamically typed language like Python, every time the interpreter encounters an expression like c = a + b, it will first have to check a's type (and find out it's an integer). Then it checks b's type (and finds out it's an integer too). Then it checks if there's any known operation to add (+) two integer types (of course there is). If you do this one million times in a row (e.g. in a loop), the same checks will be performed over and over again. It's like asking "Is a still an integer? ... What about b? ... Do I know how to add two integers?" one million times in a row. And of course all of that happens at runtime. That's why Python is so damn slow in that regard.
@dariuszspiewak5624 2 роки тому ⁺¹⁵
Yeah... that's the price you pay for flexibility. But there are ways to overcome this. They all require some kind of hinting that the structures you use contain only elements of the same type.
@sumnerhayes3411 2 роки тому ⁺³⁹
> In a dynamically typed language like Python, every time the interpreter encounters an expression like c = a + b, it will first have to check a's type (and find out it's an integer). Then it checks b's type (and finds out it's an integer too). Then it checks if there's any known operation to add (+) two integer types (of course there is).
A dynamically typed language implementation doesn't *have* to do things this way, that's just the naïve approach (albeit the easiest approach out of the box for new implementations). A smarter specializing JIT can create precompiled versions of functions and expressions for commonly used types (e.g. integers), and can determine that they are invariant and avoid having to re-check the types each time you hit the expression.
The default CPython implementation does things the way you describe, but alternate implementations like psyco and PyPy take the specializing approach in their JITs.
Not surprisingly, PyPy executes these for and while loops on the order of 50-100 times faster than CPython does on my machine (and ~10 times faster than numpy).
$ python3 benchmark.py
while: 5.254076757992152
for: 2.9640753349813167
sum_range: 1.1016407490242273
sum_numpy: 0.6697274879843462
$ pypy3 benchmark.py
while: 0.05818561598425731
for: 0.07444041498820297
sum_range: 0.07667819698690437
@comrade_marshal 2 роки тому
I hope this problem can be partly solved if people declare data types of nearly all the variables as they need to do in C before hand like I do most of the time
@kylehart8829 2 роки тому ⁺²
@@sumnerhayes3411 There are tons of other reasons why dynamic typing is pointless and bad. It saves you having to *gasp* tell the program the type of a variable one single time; and it costs you dearly. All python projects should enforce static typing, even though that doesn't gain performance. All languages should be statically typed, period, because it is infinitely more readable and provides no development benefits at all.
@grawss Рік тому ⁺²
@@kylehart8829 I would have disagreed prior to learning Python, back when it was just dogma in my head. Interesting how that works.
However, the learning process was made easier by removing a step. I think a better solution would be to make it extremely clear how to toggle Python into enforcing static typing, and why that's a huge benefit.
@opticalmoose8091 2 роки тому ⁺⁷⁶
Sometimes it’s just impossible to express something in numpy - iterative algorithms, for instance. I had a very hot piece of code, which implemented an A* pathfinding algorithm, which was slowing my entire program down.
I looked for solutions and found a package called Numba - it allows you to decorate a single function and it would try to compile it the best it can. There are various optimization settings to consider, and in order to get *significant* improvements in performance (like, order of magnitude and more) you basically had to rewrite your program as if you were writing in C, but using Python’s syntax - so, no dynamic typing, using very limited subset of built-in types and refrain from using custom classes as much and possible.
But it didn’t require too much work, and it was a great benefit in the end - I ended up with a blazing fast function without needing to write it in actual C (not very hard) and then figure out how to couple this with Python (quite finicky if you’ve never done it). So yeah, I’d recommend looking at Numba if you need a more general solution.
@NJ-wb1cz 2 роки тому ⁺¹
Or just ditch python and go for java or other jvm languages or go or rust etc
@pavelperina7629 2 роки тому ⁺⁴
@@NJ-wb1cz Eh, rust is likely one of the hardest languages to learn (after "modern" C++ with template metaprogramming). Julia seems quite easy to learn and pretty fast (if strongly typed it's close to C, maybe twice or three times slower, but not two orders of magnitude like Python). On the other hand I'm not sure how to structure code which is too complex for a single file and too simple to put it into package. It was basically developed for data crunching.
Also year or two ago Rust was lacking some libraries, especially for plotting graphs (C++ lacks them too), painting (Cairo) and even exporting data as images (no support for writting of 16bit PNG files)
@julians.2597 11 місяців тому ⁺³
@@NJ-wb1cz right, who would do something as stupid as just adding another library as dependency when they could add an entirely different development stack just to optimise one function.
@NJ-wb1cz 11 місяців тому
@@julians.2597 why is adding libraries stupid? That's a strange stance. Sure, moving to another language is an option, but you'll likely use libraries there as well.
@igorthelight 4 місяці тому
Before ditching Python - try those two things:
* PyPy
* Nuitka
Both are still Python and both don't require you to change any code ;-)
@DeuxisWasTaken Рік тому ⁺⁸
Very good video. Note: NumPy isn't just C, it's highly optimised C with SIMD capabilities. The only way you can go faster than that is with GPGPU like OpenCL, CUDA, Vulkan or Metal. (And even that isn't assured since GPGPU has initial overhead.)
@SteinGauslaaStrindhaug 3 роки тому ⁺²⁹¹
Half through the video I thought: actually for this simple sum you don't even need to loop just the formula. And you didn't disappoint.
While this might seem like a very synthetic example which won't often be possible in real life, you'd be surprised how many times I (a programmer who is not even very good at maths) have been able to replace large loops and delete large chunks of code and replace it with a simple formula.
Not usually this clean and simple, but quite often you find that the code is looping through a lot of data calculating stuff and then filtering out stuff, and if you simply rearrange things you might be able to do the filtering before the loop which might even reduce the loop to a few statements without a loop. Very often this is not obvious in the code but very obvious when sketching out the desired results on paper.
Pencil and paper is definitely not obsolete.
@mCoding 3 роки тому ⁺¹⁰²
Indeed! Never underestimate math! The feeling of deleting 30 lines of scratch and replacing it with 3 lines of math is indescribable.
@OatmealTheCrazy 3 роки тому ⁺⁴⁴
@@mCoding it's the moments when you're lying in bed and it kinda hits you out of nowhere that are the best
Of course, it is kinda a let down when the divine revelations are actually wrong though
@ttt69420 3 роки тому ⁺¹³
Doing it on paper with a pencil probably triggers a different part of your thinking process from all the years of doing math that way.
@delqyrus2619 3 роки тому ⁺⁸
You can actually be even faster than with math, if you know how CPUs work. If you instead of calculating (n * (n -1)) // 2 just calculate (n * (n -1)) >> 1 it should (in theory) be a few clockcycles faster. While most math operations are pretty trivial for the CPU(+, -, *) division is a pretty annoying task, so you might spare some time with just bitshifting.
@timewave02012 3 роки тому ⁺⁹
@@delqyrus2619 Bit shifting is math too. No different than multiplying and dividing by powers of ten being easy in decimal. A good compiler will optimize it though, so it's probably best not to bother.
@saivardhanchowdary7918 3 роки тому ⁺³⁰⁶
The UA-cam algorithm got you mate. All the best!
@mCoding 3 роки тому ⁺²⁷
Thanks for the support!
@NickByers-og9cx 3 роки тому ⁺⁸⁶⁹
You want to be careful when timing functions, your processor likely speeds up or down based on system load and ambient temperature changes. Although you can still capture order of magnitude changes very easily like you did. If you're on a laptop though where cooling and power are limited, take extra care, and run multiple simulations in different orders at random times!
@mCoding 3 роки тому ⁺⁴⁹¹
I cannot stress this enough! Benchmarking is such a difficult topic even though it seems so simple.
@luziferius3687 3 роки тому ⁺⁹¹
One thing you can do is to disable CPU boost clocks during benchbarks. This way, you get stable clocks regardless of load created by the tested function. For example on Linux, you can just write 0 to /sys/devices/system/cpu/cpufreq/boost to disable the boost or write 1 to enable it again. This’ll get you much more stable results that are comparable even if ambient temperature changes.
@NickByers-og9cx 3 роки тому ⁺²⁹
@@luziferius3687 that's a good approach! Myself I usually try to do perf stat whenever possible, and count the actual CPU cycles it takes. Run each command say 3 or 5 times and take the median!
@Teriton 3 роки тому ⁺⁴⁷
@@mCoding Obviously the best way is to time it yourself with a stopwatch
@MisterAssasine 3 роки тому ⁺⁴
imo it would have been better to use a smaller n in the function and a higher number of runs
@fillfreakin2245 2 роки тому ⁺¹
If you're using a counter within the while loop, then yes that counter is slower than the for loop. If you don't, and you use the while loop as intended and doing the same stuff inside the loop, then it is essentially the same speed.
def while_loop(n=100_000_000):
s = 0
while s < n:
s += 1
return s
takes about the same time as:
def for_loop(n=100_000_000):
s = 0
for i in range(n):
s += 1
return s
Use the right loop for the right task. If you have to put an incremental counter in your while loop then you're using the wrong loop.
@ssfjor 9 місяців тому
Comparing the sum with the counter limit certainly won't give the correct result in s: We are calculating the sum of all integers and analyzing various ways to do that. I. e., if the limit where 10, with 1,2,3,4 we reach it, way before reaching the correct sum 55.
@sploofmcsterra4786 2 роки тому ⁺⁴
This was a valuable lesson to learn when I was coding a neural network. I wasn't aware of this fact and was trying to parallelise, before realising numpy automatically did that. Turning everything into vector operations led to a speed increase that was crazy.
3 роки тому ⁺²²⁵
This is the reason why python is such a good programing language. You can write your program with easy and pretty syntax and use packages wrapping C code to do the complex and time consuming tasks. This just shows that you can use python and have fast running code.
I would also add that if your program's only focus is speed and performance then don't use python, simple as that.
@mCoding 3 роки тому ⁺¹⁵⁹
People too often leave out the speed of development in their calculation of speed!
@waldolemmer 3 роки тому ⁺¹³
Python isn't special in this regard
@smallcatgirl 2 роки тому ⁺⁴
People oftentimes forget lua.
@HydratedBeans 2 роки тому ⁺⁸
A faster processor is cheaper than another developer to speed up development. That’s why I use python.
@NJ-wb1cz 2 роки тому ⁺¹
@@HydratedBeans unless you depend on oython-specific libraries I don't think the development speed is noticably greater compared to, say, kotlin. And a loosely typed language will always have more mistakes and harder to statically check
@hemanthkotagiri8865 3 роки тому ⁺⁵²
To the point, quality content. Subscribed!
@mCoding 3 роки тому ⁺⁶
Thanks for the support!
@GuilhermeFArantesSouza 3 роки тому ⁺²⁵⁴
"The best way to loop in python is to not loop", hahaha, loved it.
Also got curious about doing the numpy sum but with the built-in range (because of the memory problem), how much would that impact the performance?
@mCoding 3 роки тому ⁺³⁵
In my experience, I've never hit a RAM limit because of this, but in memory constrained systems this is definitely something to watch out for. I wish numpy has something like std::ranges::iota to use as an array-like.
@aquilesviza5550 3 роки тому ⁺⁶
@@mCoding There is a library called dask, is used in distributed computing and has support for numpy arrays. working with blocks consume less memory but might be a bit slower.
@tonyflury2291 2 роки тому
i think numpy.arange(...) creates a potential memory issue (it allocates the a large array in this case), but the Python3 builtin range(...) doesn't allocate a large array - it is a lazy iterator that simply tracks the last value generated so it knows what to generate next.
@npip99 3 роки тому ⁺⁹
Fun fact: If you write this in C, gcc/clang will automatically optimize the for-loop sum into n(n-1)/2 for you.
@underfilho 3 роки тому
really?
@tharfagreinir 2 роки тому ⁺⁴⁷
There's a fun story about that summation formula. Apparently mathematician Carl Friedrich Gauss came up with it at a young age when his teacher set his class the task of adding up all the numbers from 1 to 100 and was very surprised by Gauss' solution. Regardless of the veracity of the story, the formula is indeed often referred to as the Gauss summation formula. Mind you, a lot of things in math are named after Gauss as he was extremely prolific.
@magicmulder 2 роки тому ⁺¹⁴
You left out one important aspect; the teacher wanted to keep the kids busy for a while and Gauss came up with the solution within minutes.
@thomas-sk3cr Рік тому ⁺⁹
Another important aspect, the formula is actually (n+1)*n/2.
@MattDunlapCO Рік тому ⁺¹
@@thomas-sk3cr Exactly. I was looking for this correction.
@adrianmizen5070 Рік тому ⁺¹
@@thomas-sk3cr which actually illustrates a pitfall of using external math formulas instead of letting the computer do the work, that they are going to be less clear about what the code is supposed to do. getting the wrong answer really fast is worse than getting the right answer slowly
@thomas-sk3cr Рік тому ⁺²
@@adrianmizen5070 not sure if I understand you correctly. I do think that it's definitely better to use a formula if there is one. But it should be properly documented of course and you have to make sure that it's the right one. Maybe we can agree on that? I don't think you should reimplement well-known formulas with loops just because it's more clear to the reader at first glance :)
@GuRuGeorge03 3 роки тому ⁺⁶⁶
Oh this is an amazing recommendation by youtube. I came to the same conclusion as you while having massive performance issues in my machine learning models in non obvious parts of the code. Once I noticed that it is the loops that are awfully slow I tried refactoring them as much as possible and ended up using a mixture of things u've shown here. I didn't even know or speculate that it's because of C but now everything is much clearer in my head. Thanks!
@mCoding 3 роки тому ⁺¹³
You should also check out Jax, Numba, and Cython for potentially enormous speedup for little effort if you are doing ML.
@robertmielewczyk4219 3 роки тому ⁺⁸
you're not supposed to do ML with for loops in theory you can and it's very undestandable if you do it that way but any solution using matrices/tensorflow will be faster. Like he says in the video numpy is fast
@deemon710 2 роки тому ⁺³⁵
Oh snap! I thought I was saving on resources by not using additional modules like numpy but now I see it's ridiculously faster. I wasn't aware pre-built functions were that much better either. Thanks for the reveal!
@eric_d 2 роки тому ⁺¹³
He actually did say in the video that Numpy uses a lot more resources, and with a bigger number, you may not have enough memory. Your original thought was correct, that you do save resources by not using additional modules. You may save time, but not resources. Also, if you compile your code, those additional modules will need to be compiled in, which will make the executable larger, and require more resources as well. As we are quickly approaching the limits of how far we can go with hardware (CPU technology is pretty much capped out, unless someone figures out how to make atoms smaller), we need to be more mindful of the amount of resources we use. Programmers need to stop being lazy. They have to stop thinking that they can bloat their code with hundreds of modules because CPU's will be faster by the time their code is in use.
Just because speed and bandwidth are available, doesn't mean they should be used. For example, I know of some websites that include 50-100MB of Javascript code for each page view. Sure, dial-up is a thing of the past in most parts of the world, and people have faster internet connections than ever before. The page loads pretty fast on a gigabit connection. What about the people who have much slower connections? Is it really necessary to design your website where every single page view uses an entire day's worth of bandwidth on a dial-up connection? If you have several pages of the same website open in different browser tabs, your memory utilization goes through the roof! I've seen a single browser tab on one of those websites using over 2.5GB of RAM! I haven't even checked how much unnecessary Javascript is downloaded when using something like Facebook, but I've seen a single tab with Facebook open using over _5GB RAM_!!!
This kind of waste needs to stop. Even if it does save a second or two here and there, it could cost several more seconds (or longer) in other areas. I've always hated importing modules in any language I've ever programmed in. I would much rather write my own code to do what I need done, whenever possible. I realize that's not always possible, but it's always better than adding bloat to a program that's not needed. Look at what's known as the "demo scene". It's something that started LONG ago, where people try to make the most exotic program possible in assembly language that fits within a 64KB executable. People have fit 10 minute long full screen videos with music into 64KB! They have created complete first-person-shooter games in 64KB! These programs run plenty fast on hardware from over 30 years ago! If more of todays programmers took the time to write good code, we wouldn't even need computers as fast as we have.
Sorry, that started off as a 1 line reply, and ended up going 100 directions. There's still soo much more I want to say about the topic, but who is going to read it anyway?
@moustafael-shaboury2659 2 роки тому ⁺³
@@eric_d I will! I'm very new to programming and I really appreciated reading that. Thank you for the reminder :)
@shingo371 3 роки тому ⁺⁷¹
Awesome content, got very surprised when I saw your low sub count
@mCoding 3 роки тому ⁺¹⁰
I appreciate that!
@anb1142 3 роки тому ⁺¹
me too
@aidanwelch4763 Рік тому ⁺¹
Thanks for this, I was trying to explain to people in JS that essentially always calling a native function is faster than writing a more optimized(in terms of Big-O) way of doing something. They were asking Big-O proof and didn't believe me that it's just not relevant. Looping through a string 10 times with a native regex check is faster than iterating through it once in pure JS.
@cottawalla 3 роки тому ⁺⁶
It was the one addition statement executed many times that slowed the python "for" loop. The conclusion is that looping can be very fast but executing python statements anywhere in the code is very slow.
@dddbra1748 3 роки тому ⁺⁴⁶
I think your channel deserves more subscribers.
@mCoding 3 роки тому ⁺⁵
I appreciate that!
@kenhaley4 3 роки тому ⁺²⁰
Another reason the numpy approach is faster: The elements of a numpy array must all be of the same type, and there's no overhead to check (and convert to numeric, if necessary) the type of each array element.This probably contributes much more to the increase in speed than the language difference (C vs. Python.)
@vishnukumar4531 2 роки тому
Other than that, the sum itself can be computed parallelly and fused together later, thus we are not forced to iterate serially element by element!
@dmitripogosian5084 2 роки тому
In compiled languages loops are often optimized on compilation stage, such as being unrolled where necessary, with particular emphasis on efficient cache/prefetch from memory utilization. One should look on original Fortran code of LAPACK routines where some loop optimizations were done 'by hand'
@shinobuoshino5066 Рік тому
@@dmitripogosian5084 loop unrolling is never efficient when cache and prefetch is the topic unless you explicitly wrote "from 0 to 4" just to avoid copy pasting code 4 times by yourself, and even that may not be worth unrolling depending on body of the for loop which most compilers can decide by themselves if you do profile guided optimization pass. There's a reason why -O2 more performant for most software than -O3 and unrolling is one culprit of that.
@CodeDisease 3 роки тому ⁺⁴¹⁷
“Python is an incredibly slow language”
You just insulted my entire race of people...
but yes.
@CodeDisease 3 роки тому ⁺⁴
I guess I’ll be first then...
@suryanshthakur5820 3 роки тому ⁺⁶
Your guess right
@soupnoodles 3 роки тому ⁺⁸
Its not incredebily slow, compared to others its slower yes
but it can acomplish the tasks its meant for with no issues
Machine learning, Automation, Data Science and Cyber Security
@CodeDisease 3 роки тому ⁺⁴
@@soupnoodles Yes but you can do all of that much faster with C, you're better off using C.
@CodeDisease 3 роки тому ⁺¹
@@soupnoodles Also why did you like your own comment?
@fanoflinoa6109 2 роки тому ⁺¹⁵
Great comparative video
The math solution if you include n in the addition is ((n+1)*n)/2
Because you used range() and the while loop was set to < and not
@aioia3885 2 роки тому ⁺¹
Flooring the division is very much recommended, because otherwise python will convert the result to a floating point number which may cause you to lose precision
@fanoflinoa6109 Рік тому
@kurogami2771 it does in this case as we are adding up a series of numbers from 0 to n, which by definition excludes negative values.
As for flooring, it is true that we should avoid using floats when not needed even for just one operation so we can either floor or int() the results.
@abebuckingham8198 9 місяців тому
@@aioia3885 Either n or n-1 is even so it's never a float. Using floor isn't required.
@aioia3885 9 місяців тому ⁺¹
@@abebuckingham8198
edit: "either n or n-1 is even so it's never a float" is not correct because in python division with / will always result in float conversion (see type(1/1)). the rest of this reply is just trying to clarify what I now realize was not necessary to clarify
in my comment I said "flooring the division is very much recommended" but what I really meant was not using the floor operator on the result but instead using the integer division operator. in python if you do a/b the result will always be a float bu if you do a//b then the result will be an integer if both a and b are as well. doing a//b is basically like doing int(a/b) except that there will be no conversion to a float and so no precision will be lost. for example, int((10**18+2)/2) evaluates to 5e17, because the division was a floating point one, but (10**18+2)//2 does result in the expected value of 5*10**17+1
@supergamerfr 3 роки тому ⁺⁴
This channel is gonna blow up great work
@LeDiop 3 роки тому ⁺³⁶
That's why Gauss exists : simple use of (N-1)N/2 in this case
@mCoding 3 роки тому ⁺¹⁸
And he came up with this at such a young age too!
@bFix 3 роки тому ⁺⁸
wait isn't it: (N+1)*N/2?
@LeDiop 3 роки тому ⁺⁷
@@bFix He takes sum to N-1 in the example
@Samsam-kl2lk 3 роки тому ⁺²
@@LeDiop Yes, so it is n*(n-1)/2
@LeDiop 3 роки тому
@@Samsam-kl2lk yup
@minghaoliang4311 3 роки тому ⁺⁶⁹
Can you do a comparison of recursion over loop?
@mCoding 3 роки тому ⁺³³
I have many video ideas in the works, this is definitely a contender.
@ikickss 3 роки тому ⁺⁵
Usually recursion is slower and takes up more memory. And many "big" problems cannot be coded recursively due to stack size limitations. But if compiler/interpreter can do tail recusion optimization, it's a very nice option. Unfortunately, not all problems are applicable to it.
@shukterhousejive 3 роки тому ⁺³
The CPython interpreter doesn't do tail recursion so even if you write it out correctly it's still faster to loop, save the recursion for when you forget how to convert your code into a loop correctly
@sbwlearning1372 3 роки тому
@Sec Coder Check out Lisp or Prolog. The recursion is conceptual in terms of the problem or model.
Imagine a family tree. The definitions and relationships are recursive. Mom and Dad produce offspring each of which in turn can become Mom's or Dads who produce offspring.
Comes into its own when these Relationships are highly complex ( Imagine every leaf on a real tree was in fact another tree and so on and so forth. The Mantlebrot set is recursive )
Essentially they become huge lists or recursive data structures.
Those languages became prioritized and optimised around lists and recursive data structures.( With a few constructs and language primitives not found in the likes of C or Pascal such as "is a type of "
"is connected to" "is related to", "is derived from" etc etc )
Pythons modelling capabilities Tuples Lists etc etc Means most of it can now at least be modelled easily before being translated to something else for speed.
Ps for fun check out Occam . It's editor was indent sensitive just like Python
It also let you mix sequential and parallel programming .!!!! Oh lordy 😁
@fat_pigeon 3 роки тому
@Sec Coder It lets you express common algorithms in a purely functional way, i.e. without needing to mutate variables. Doing so makes your code easier to analyze mathematically and makes it more composable.
@chrisray1567 3 роки тому ⁺⁵
Great video. I’ve always found list comprehensions to be faster than the equivalent for loop or while loop.
@simonmultiverse6349 3 роки тому ⁺³
There is a language called Lython, so you don't try to loop in Python; instead you poop in Lython.
@FallinPython 3 роки тому ⁺²
Thanks for the tips!
The comparison with NumPy is not fair because it is made for array computing and hold an unique data type.
By the way, you deserve far more subscribers.
@mCoding 3 роки тому ⁺¹
Thanks!
@Connor-of6mu 3 роки тому ⁺¹
I really like your channel! These are very interesting insights into the Python language in small videos which makes them easy to digest. Keep up the good work!
@mCoding 3 роки тому
Awesome, thank you!
@BartMassey-PO8 3 роки тому ⁺⁸
Rust with 64-bit numbers looks to be about 50× faster than numpy on my box (once I convinced the compiler to stop solving the problem at compile-time without losing optimization). Rust with BigUint bignums looks to be about 5x faster than numpy. The obvious for loop with pypy3 is about 12x faster than numpy. So there's still plenty of room for speedup.
@brianchandler3346 3 роки тому ⁺⁸
Expertly presented. I never would have thought to look at it where C was used behind the scenes. I will definitely try and make the better to worst progression habit when writing Python.
@nvmcomrade 3 роки тому ⁺¹³
Half way through the video I thought to my self why even loop to find an answer to a simple arithmetic progression and then I was surprised that this was addressed as well. Wow, impressive.
@mCoding 3 роки тому ⁺¹
Thanks!
@lepepito 3 роки тому
It serves the purpose to explain the problem of loops and also the process of finding different solutions to a problem
It’s a nice exemple
@maxch.9135 3 роки тому
you are the only programmer that i pressed liked, subscribed and the bell icon :)
@daveys Рік тому
I’m a Python novice and I found this to be really interesting and useful. Many thanks!
@ColinBroderickMaths 3 роки тому ⁺⁵
Just wanted to note that using an iteration index in a while loop isn't really good practice anyway, since that alone would almost always mean a for loop is the correct pattern. So it's not really surprising that the while loop was slower in this experiment.
I'd also like to warn people that numpy is only faster if you are operating on large sets or large arrays of numbers. If you apply a numpy function to a single number, it's almost always much slower than the equivalent function in Python, presumably due to some overhead.
@rursus8354 2 роки тому
That isn't the point. The point is the efficiency of raw Python.
@zf0666 3 роки тому ⁺³
Good example of how being good at math and algorithms beats all micro optimizations of the code itself :)
@mCoding 3 роки тому ⁺³
Exactly!
@spacelem 2 роки тому ⁺³³
My own results were 6.06 seconds for the while loop, 4.25 seconds for the for loop, and 569 ms for the numpy loop.
I tried it in Julia (almost identical code except for the end tags), and the results were 1.3 ns for both the for and the while loop. Checking the lowered code it turned out it had optimised the loop away completely, so we got sum_math for free! So instead to force it to do the actual sum, I modified it to s += i * (rand() < 0.5), and it came in at 576 ms. So even with a call to rand() and a branch, Julia was still nearly 10x faster than raw Python, and was about on par with NumPy (which didn't have a call to rand() ).
If I force Julia to construct the range object with collect(1:n), then I don't need so that it matches NumPy, then it gets even faster at 290 ms, but now it allocates 763 MiB of memory, so clearly there are pros and cons of using range objects, although I'd stick with the range object for most usages.
So, if you like Python but you want C performance (and you like to use arrays a lot and don't like to add all the extra numpy syntax just to do that), maybe check out Julia.
@incremental_failure 2 роки тому ⁺¹
Most like myself use Python for the libraries, nothing else comes close except for maybe C++. It would take a lifetime to recreate all the libraries in Julia.
@spacelem 2 роки тому
@@incremental_failure there has a lot of ongoing work to make native Julia libraries (mostly in the scientific and statistical communities), but Julia can also call C, C++, and Fortran code, and the PyCall package already allows you to embed Python in Julia, so you're never left in the lurch if you choose to use Julia.
I've already seen Julia be hugely faster and more flexible when doing a lot of ODE modelling than when using R and calling deSolve which is written in Fortran.
@TARS.. 3 роки тому
This channel is a gold mine, thank you so much for these. Keep them up!
@mCoding 3 роки тому
Thanks, will do!
@Jcewazhere 3 роки тому ⁺²
In my one of my programming classes we learned about loop unrolling. This is much more handy to know about. :)
@micirei 3 роки тому ⁺¹⁰
Your channel is great! I suggest you use a high pass filter for your voice to cut off the low frequencies that make your voice sound muddy, great job so far.
@mCoding 3 роки тому ⁺⁵
Thanks for the tip! I have no idea what I'm doing video editing so tips like this can really help improve my video quality!
@emilemil1 3 роки тому ⁺⁴
It would be a good idea to show examples that does a bit more work within the loop, to show that if your loop body is costly then how you loop is largely insignificant.
@K9Megahertz 2 роки тому
Yeap, I was thinking the same thing. These tests really don't tell you anything meaningful.
@Generlc_Human 3 роки тому ⁺⁸
when you dont know how to code but you know english: "i know some of these words"
@entropyz5242 3 роки тому ⁺¹
I like this video very much. Subscribed.
Also, interestingly enough, I am taking a class about functional approach to engineering problems and the professor repeats every class to avoid looping like you would avoid the Black Plague. Lol
@braulioacosta6816 3 роки тому
what's the name of that course?
@entropyz5242 3 роки тому ⁺¹
@@braulioacosta6816 functional approach to engineering problems lol
@seboll13 2 роки тому
I was so much expecting this formula to come up at the end :p
@andreabradpitto 3 роки тому ⁺⁶
This is very informative and well presented. Also, it seems like you are very talented in explaining stuff, but this is only the first video of yours I have watched, so I'll have to do some further benchmarking ;)
Thank you!
@jetison333 3 роки тому ⁺⁴
Very informative video, keep at it and im sure you'll grow a ton.
@mCoding 3 роки тому
I appreciate that!
@reprC 3 роки тому ⁺²²
Cool video. I’d be curious how `numba` stacks up in this, both the JIT and AOT variants
@youuuuuuuuuuutube 3 роки тому
Numba would make it closer to the unoptimized C/C++ code. I used it to render fractals and I got it to run 50x faster than unoptimized Python, and almost the same speed as the C code. The thing is ... that you can make the loop a lot faster in C too because you can use SIMD intrinsics, so you could do 4 or 8 or even more operations in a single instruction.
@tobiasgrun5116 3 роки тому
Just tested.. The JIT execution time is similar to the math version. To all Numba / LLVM developers: Great job!
@christianhill8651 3 роки тому ⁺²
You just convinced me to learn algorithms and functions. This was an amazing illustration.
@roxannemoore9694 2 роки тому ⁺¹
Loved the video, certainly gives me some ideas for my own code optimisation.
However, I think your maths is off? In the arithmetic summation formula Sn=n/2(2a(n-1)d) being simplified where a=n, you've forgotten that the common difference will be -1 instead of +1 as we're counting down, not up. This should simplify to (n*(n+1))/2 instead of (n*(n-1))/2? Tested with the numbers 1 through 10 yields 45 on your formula, 55 for the correct answer.
@dwarftoad 2 роки тому ⁺³
I think numpy also will generally, if it can, use parallel CPU operations (SIMD) to process the array, which is probably why it's faster than standard sum() (and also why it makes sense to create the array of data with arange() even if each value could be calculated in the loop. But definitely good to be aware of this memory requirement.)
@mrtnsnp 3 роки тому ⁺¹⁸
The numba package may be useful in some cases.
@mCoding 3 роки тому ⁺³
Absolutely, if you can use Numba to @jit your problem, them it will very likely speed up the solution! I'll probably cover this in a future video.
@mrtnsnp 3 роки тому ⁺⁴
@@mCoding I did a small test.
from numba import njit
@njit
def int_sum(n=1_000_000_000):
s: int = 0
for i in range(n):
s += i
return s
timeit.timeit(int_sum, number=1)
5.373000021791086e-06
timeit.timeit(int_sum_p, number=1)
44.33714248199999
int_sum_p is the same function, without decorator. The result is very close to C speed. But obviously, the best improvement comes from doing the maths properly.
@dmccallie 3 роки тому
Numba is amazing, if you can limit the code in the @jit’ed routine to simple loops and math and/or numpy operations. You can’t use things like lists or dicts
@gregorymorse8423 3 роки тому ⁺¹
Should do all the examples with numba this is also interesting. Cython also. Masterclass in squeezing loop performance, could also show sum(x for x in range(n)). Ideally use matplotlib to show the data for different sum sizes and prove the obvious that it is linear
@mrtnsnp 3 роки тому ⁺¹
@@gregorymorse8423 When trying to make a comparison with compiled C++ code, I discovered that clang removes the loop altogether, applying the maths that the programmer should have done in the first place. And I suspect that numba applies the same optimisation, otherwise the total time does not make sense with the clockspeed I have.
@logicprojectsspeed2023 3 роки тому ⁺¹¹
Great video. Commenting mostly to help the algorithm push this. Would have loved to see a comparison with straight C thrown in there given that C code was the source of the speedup. What's the cost of the python to C transitions?
@mCoding 3 роки тому ⁺⁵
A C compiler would probably compute the answer at compile time so there is nothing to compare! A Python to C transition would be a huge undertaking even for a small project.
@oj0024 3 роки тому ⁺²⁰
Here are the results on my pc:
python with for loop: 8.74s
c with gcc and no compiler optimizations: 0.235734s
c with gcc and full compiler optimizations: 0.048770s (gcc doesn't detect the pattern and actually executes the entire code)
c with clang and full compiler optimizations: 0.000001s (clang detects the pattern and optimizes it using the Gaussian formula)
@alejandromorales8784 2 роки тому
@@oj0024 wow, there is absolutely nothing to compare. I knew python was slow and C was fast, but I didn't know that the difference is to this extent
@tonyflury2291 2 роки тому
@@mCoding Nuitka will take Python code and transpile to C (which uses the Python standard Library to do the object iterations) - on some applications it is 2x faster than normal Python (and it is a work in progress).
@More-right-rudder 3 роки тому ⁺²
This is probably the first coding channel that I looked up just for fun. Great work. Can you cover advance pythonic syntaxes and topics?
@mCoding 3 роки тому
I'm always open for suggestion! What do you want to see?
@More-right-rudder 3 роки тому
@@mCoding Nothing in particular just any fancy pythonic things which help improve performance and code readability.
Also, one thing you can do is make a mess of a code and start refactoring it. That would be amazing to see.
@dragonsage6909 2 роки тому ⁺²
Wow .. I learned about 5 new things from this.. using timeit is a great idea.. thank you!
@SillyLittleMe 2 роки тому ⁺¹²
If you are writing pure and iterative heavy Python code, then PyPy is your best bet for performance. I ran the same test as mentioned in the video on Python 3.10 and PyPy 3.9 and my god the difference was staggering! Here are my results :
Python 3.10:
for loop : 6.6 secs
while loop : 11 secs
PyPy3.9:
for loop : 0.22 secs
while loop : 0.36 secs
PyPy is really meant for long running code and real world results show that.
@TheParkitny 3 роки тому ⁺³
There is another option. Often you can use the numba library to get a speed boost over numpy.
@supermariobrosz8 3 роки тому ⁺⁵
Great video! Another idea is to time yield in functions and generator expressions in comparison to the traditional loops
@mCoding 3 роки тому
Great suggestion!
@ferociousfeind8538 3 роки тому
yes, graph the size of n against the time it takes different strategies to compute the sum >:3
make the computer dance for my entertainment!
@thelazyrabbit4220 3 роки тому
Dayumn... TIL something new as well as some new trick. You definitely earned my sub!
@mordream Рік тому
I like how he showed us a clever way to implement loops and NOT how to think outside the loop and just find the answer with math
@edwinjonah 3 роки тому ⁺⁵
Awesome video! A few days ago I was comparing the execution time on using a loop vs. the Gaussian formula, it's nice to see some other options in between.
@mCoding 3 роки тому ⁺¹
Thanks for the support!
@lb2040 3 роки тому ⁺¹
@@mCoding Very interesting video indeed, something I never really thought about … but the math-guy in me needs to say it: You got the formula wrong… 😬
@sergeyshchelkunov5762 3 роки тому ⁺⁵
Very informative. Excellent insight into Py inner workings. Great incentive to employ Numpy (and Multiprocessing) whener possible. Give a simple yet splendid example of why knowing math is important for any programmer who tries to use Py for data science type applications.
@yolamontalvan9502 3 роки тому ⁺⁶⁰
That’s why I would use C to program a missile.
@mCoding 3 роки тому ⁺⁴⁸
Watch out for those null pointers though, wouldn't want your missile program to CRASH!
@bamberghh1691 3 роки тому ⁺⁹
Usually Ada or SPARK are used in high reliability systems like aircraft instead of c since they are much safer
@mCoding 3 роки тому ⁺⁷
I had heard about this before, thanks for sharing, it makes a lot of sense in safety critical applications. I know NASA still uses C for (as least some?) space shuttles!
@mwanikimwaniki6801 3 роки тому ⁺¹
@@bamberghh1691 I knew ADA is used for missiles... But is there a specific reason why.
@arpandhatt6011 3 роки тому ⁺⁹
@@mwanikimwaniki6801 Ada has been specifically designed to catch as many mistakes at compile time. It’s static analysis tools are extremely thorough, and in general, it’s designed to prevent undefined behavior, from its type system, to its syntax. It’s an interesting language.
@tamoregi Рік тому
If I recall correctly there was an optimisation in LLVM/Clang that was able to detect looping over elements in order to add them and inject the n(n-1)/2 directly into the assembler. :)
@AeryelleCat 3 роки тому ⁺¹
That is ridiculously clear, it blew me away. Straight to the point, extremely clear explanation, very methodical, and everything is shown perfectly. Just purely amazing tbh.
@geeshta 3 роки тому ⁺³
def recursive_loop(n=100_000_000):
s = 0
def loop(i):
nonlocal s
s += i
i += 1
return loop(i) if i < n else s
return loop(0)
> *RecursionError: maximum recursion depth exceeded in comparison*
import sys
sys.setrecursionlimit(100_000_000)
> *segmentation fault (core dumped)*
😂😂😂
@mCoding 3 роки тому ⁺²
:) nice try A+ for effort 👌
@bartkl 3 роки тому ⁺⁸
Nice!
Question: if you increment i inside the for loop, won't that affect the amount of iterations (skip every other one)?
@mCoding 3 роки тому ⁺⁷
It actually does not affect the number of iterations. The iterator of range keeps its own internal counter that python uses to assign to i each iteration.
@incremental_failure 2 роки тому
The range is built before executing the loop contents.
@OMGclueless 3 роки тому ⁺³¹
It's also worth taking a second to remember that even in its slowest form, the python code is adding up 100 million numbers in 20 seconds. It's slow, but "slow" is a relative term.
@mCoding 3 роки тому ⁺⁶
This is a good point that everyone should keep in mind!
@carmelo5991 Рік тому
Is slower than other languages but still enough because computers are very fast
@edwinontiveros8701 Рік тому ⁺¹
On any CPU capable of many teraflops, 20 seconds for 100 million numbers means eons. It's like walking and taking a step only every 5 seconds, short distances It's not that bad, but will badly scale once you need to travel farthest.
@qorbanimaq 3 роки тому
This is the first time I subscribe to a channel based on seeing the first video I watch from that channel. Great job!
@mCoding 3 роки тому
Such an honor! Thank you!
@qorbanimaq 3 роки тому
@@mCoding THANK YOU for creating such quality content.
@bva0 3 роки тому ⁺²
Cool video. I would add that numba and cython are also super fast alternatives.
@davidbellamy1388 3 роки тому ⁺³
Your lessons are awesome. They have a Corey Schafer feel to the presentation in their brevity and clarity.
@mCoding 3 роки тому ⁺¹
Thank you! I've seen many of Corey's videos, so I'm glad to hear that!
@smort123 3 роки тому ⁺⁵
"Is there anything o beat numpy?"
Try compiling numpy top native machine code using numba:
@nb.njit(cache=True, parallel=True)
def numba_sum():
return np.sum(np.arange(100_000_000))
> 0.00064 seconds
Or suing your GPU (Cupy):
def cupy_sum():
return cp.sum(cp.arange(100_000_000))
> 0.001747 seconds
Cupy should be an order of magnitude faster than you CPU, but only if you do some more complex computation than calculating the sum of the array.
@mCoding 3 роки тому ⁺¹
Good point! Maybe some day I'll do a comparison of which C library for Python is the fastest... so many choices though.
@smort123 3 роки тому
@@mCoding Would be interesting to watch.
Good video btw. I did not expect the "i = i+1" to have such a huge impact when iterating.
@MrGryph78 3 роки тому ⁺⁷
Pretty certain the numpy functions are built using SSE/AVX intrinsics, hence their massive speed up compared to scalar only methods.
@krishgarg2806 2 роки тому
for people more interested in the formulae, it's for sum of n terms of an AP
the formula is:
(n/2)[2a+(n-1)d]
where,
a = starting term
d = common difference
n = till nth term
@TheSandkastenverbot 3 роки тому ⁺²
Thanks for the information! It's really helpfull to get a glance into the Python interpreter's inner workings
@comradepeter87 3 роки тому ⁺⁵
This seems insane to me that summation took more than a second! What is the record for the same in raw C/C++?
@brandonkirincich8181 3 роки тому ⁺²
It would definitely be much faster. Especially since this is a special optimization case. It might not even generate a loop.
@mCoding 3 роки тому ⁺¹
100_000_000 is a big number! Agreed, C/C++ would likely optimize it away.
@toebs_ 3 роки тому ⁺²
@@mCoding I think you should be more clear regarding the difference between language and implementation. It was definitely an interesting video, but this is not a problem inherent to the language itself, but rather a problem of (probably?) CPython. Also, C doesn’t optimize anything, gcc does. At least if you don’t tell it otherwise.
@xrafter 3 роки тому
@@toebs_
Oh yes you need to enable the optimization flag -On n for random number .
@andreiiaz2097 3 роки тому ⁺⁴⁶
Python is kinda like the doctor:
It's the last place you want to be, but you go when you need it, because otherwise things might get more complicated
@akshaybodla163 3 роки тому ⁺⁸
I personally like to start with python. Its vast libraries and simple structure is easy for me to test programs. Later when I decide on a specific implementation, i'll code that in C++.
@andreiiaz2097 3 роки тому ⁺¹
@@akshaybodla163 Yeah, I can see that, it's nice, but how do you transfer a python library to C++?
@antonliakhovitch8306 3 роки тому ⁺¹
@@andreiiaz2097 A lot of C/C++ libraries have Python bindings so you can just use the same ones a lot of the time.
@maythesciencebewithyou 3 роки тому ⁺²
In reality, python is the first thing most people will use because it is easy to use.
Python is only usefull because if works as a glue for other languages. Easy access to libraries written in more difficult languages.
@jaredgreathouse3672 3 роки тому
Such a fact. I do statistics research using Stata. My rule is Stata where I can, Python where I must.
@PKua007 3 роки тому ⁺⁴
Actually any modern C or C++ compiler with optimization turned on (-O3 or similar) would by itself compute the value of the sum compile-time and just inline the result. I am shocked that Python interpreter does not even try to for example throw out unused instruction during compilation. There must be a way to enable this. Or maybe this instructions actually have a side effect because of its weird namespace policy and cannot be thrown away without potentially spoiling the code. I am not sure...
@shinobuoshino5066 Рік тому
Both C and C++ compilers such as GCC and LLVM are written in C++.
Python is written in C.
JavaScript is main Python's competitor in easy to use scripting language... And it's V8 is written in C++ and its JIT is incredibly fast, the only thing that can compare is LuaJIT which is practically abandonware because one dev who made it has stopped caring and noone else is intelligent enough to pick it up and maintain it properly. Still, the difference is that LuaJIT is very fast... But it can't stream code and execute it in my browser as fast as possible, most projects don't even use lua for one reason or another.
Just shows that C is obsolete language and all good programmers use C++.
There of course will still be C zealots that will pretend otherwise, but they can't even write their own compiler and have to depend on C++ programmers who they hate more than they hate C++ for some reason.
@yutubl 2 роки тому
Yes I strongly agree: 1.) optimized algorithm over loops, 2.) optimized (not runtime interpreted) loop-code, 3.) if none of the former is possible slow interpreted loop directly in Python. Where (3) is ok for best flexibility expirience when learning or developing & debugging inside new algorithms.
I've been a staff software engineer (development & quality assurance) delievering a (Basic) scriptable technical measurement data PC-application NI DIAdem, concentrating much effort into giving app-poewer users as script-programmers (engineers & physicists) optimized script commands, which performance expirience behave similar to that of Python. So using already implemented looping command may give you better performance than own loops as described here.
@danielluiscosta2883 3 роки тому ⁺²
that is why you don't do intensive processing in Python itself, but rather use packages that implement the functionality you desire directly on C
that said, the best way to loop in Python should always be to use the simplest construct that fits your use case, since Python gets more value from readability then from scavenging microseconds of processing
it's also always good to import this
@rosiefay7283 2 роки тому ⁺¹
Well said. An indictment of Python.
@LeopoldoGhielmetti 3 роки тому ⁺³
It's not really a fastest way to do it, it's always faster to use a math formula to compute a thing, if a formula exists.
The purpose of using Python (or any other language) is not to replace math formulas but implement an algorithm to solve problems in an easy way (whatever easy means).
Another way to be faster, is to implement the operations you need in assembly, but doing so you miss the point.
Math is faster than assembly, assembly is faster that C, C is faster than Python and Python is faster than bash. But they are different languages and need different skills.
The correct answer is: Prefer "for" over "while" to loop in Python because it's faster.
@tannerted 3 роки тому ⁺⁸
4:05 "Is there anything we can do to go faster?"
Yes. Write the code in anything but Python.
@gregorymorse8423 3 роки тому ⁺³
Did you run these tests with Cython? Probably bringing everything to C would have interesting results. The async model on the for iterator for example should not be faster than a while loop unless the async code were optimized out.
@Zooiest 2 роки тому
The point about math is incredibly true: I optimized my VarInt bytesize computation algorithm from iteration to bit operations and lookups, and it now runs (at 2^32 loops) in ~2s vs ~60s
@leobozkir5425 3 роки тому
WHOA 45K?! The first time I saw your channel you had 12K subs. C0ngrats and keep it up!
@atvarsgr 3 роки тому ⁺⁴
while loop 6.429943797
for pure 4.258419519
sum range 1.4508647020000005
numpy sum 0.2784187069999984
math sum 1.868000000015968e-06
iMac i7-10700K, Python 3.9.2, macOS Big Sur
@mCoding 3 роки тому
Thanks for sharing. Other people reading this, let us know what your numbers are too!
@jensdanbolt6953 3 роки тому ⁺¹
while loop 5.056872800000
for pure 3.253743700000
sum range 2.379359199999
numpy sum 0.138200499999
math sum 6.299999999015e-06
Desktop PC with AMD R5 3600, Python 3.9.2
@jensdanbolt6953 3 роки тому
Strange that my math sum is so slower than the Intel iMac while my other operations are faster. If I was to guess, there's something in Windows that adds a little delay before or after the operation runs. Might be that it takes a bit longer to call the function, but short enough that it is drowned out in the longer operations where my CPU runs faster.
@atvarsgr 3 роки тому
@@jensdanbolt6953
while loop 4.5906675
for pure 2.9198796
sum range 1.9685457999999993
numpy sum 0.10591909999999949
math sum 2.799999998970293e-06
the same iMac, i7-10700K, Python 3.9.2, Win10Pro (Boot Camp)
@oj0024 3 роки тому
@@atvarsgr While we are at it here is the contrast to a c implementation:
python with for loop: 8.74s
c with gcc and no compiler optimizations: 0.235734s
c with gcc and full compiler optimizations: 0.048770s (gcc doesn't detect the pattern and actually executes the entire code)
c with clang and full compiler optimizations: 0.000001s
@alexotenko6597 2 роки тому ⁺³
Let's also add to this that it's not just the speed that matters. Choose the right abstractions. If your language can only say "while" and "for", that's a lot to write and a lot to comprehend to work out the meaning from low-level statements like "increment" and "compare". The code is much clearer if it literally says "sum a range". The more iteration abstractions you have, the clearer the intent of the code.
@KonradKeck 3 роки тому ⁺⁴
Thanks for that! I had no idea about the ratios. Two ideas though:
I'm doing statistics and ETL jobs with pandas. I would've been interested in other methods as well like recursion or doing the same with pandas instead of numpy (whether it's any slower than pure numpy), pandas + apply + lambda, etc.
Second, going one step further and optimising performance for if-elif-else / mapping function would be great for another video.
@artar4363 3 роки тому ⁺¹
Didn't knew that Python is SO slow with those loops! Good video!
@chiragsingla. 3 роки тому
btw it's doing 100M iterations so it won't affect too much in small iterations
@maxteer2800 3 роки тому
Great stuff! I never considered that while loops would be slower than for loops, but even as you started I was guessing it would!
@mCoding 3 роки тому
Glad you liked it!
@rasowa2958 2 роки тому ⁺³
The conclusion: "maybe you didn't need that loop in the first place". I like it.
Indeed loops are the piece of code that is really worth to think through. Maybe part of computations is the same in each iteration, then calculate it before the loop, then in the loop just use the result.
If you can speed up a critical loop at the expense of code readability, do it, but add comments why and how it's done.
@heinrichskies 3 роки тому ⁺³
Today's lesson:
*Pay attention in math class*
@lakshayakula8892 3 роки тому ⁺⁵
Great video! Really informative. I'm surprised by how much faster numpy sum is over the built-in sum.
As an aside, the fastest solution would be to use the sum(1...n) = n(n+1)/2 formula. The fastest loop is no loop at all :)
@RagHelen 3 роки тому
I got these results:
while loop: 3.0520888
for loop: 2.9422460999999998
By removing the unnecessary i variable in the while loop:
def while_loop(n=100_000_000):
s = 0
while s < n:
s += 1
return s
That i has to be type checked by the interpreter in every loop.
You pointed out the range factor yourself, which also contributes immensely to something not essential to the speed of the loops.
@rickp2164 3 роки тому ⁺¹
I just beat numpy with a standard C++ program using a register ...
while loop took 2.5 seconds to complete, numpy took 123ms approx. My solution took 56ms, compiled with g++, 65ms with gcc.
Without register flag on the int, time jumps up to 66ms and 72ms respectively.
int main(){
register int rbx=0;
for(int rax; rax

Наступне

Автоматичне відтворення

Unlocking your CPU cores in Python (multiprocessing)