🎯 Key Takeaways for quick navigation: 00:00 🚀 Leveraging Built-in Functions for Speed - Using built-in functions from the standard library boosts Python's speed. - Comparison of a custom sorting algorithm against the built-in sorted function. - Built-in functions are often faster due to being written in C. 01:10 🔄 Laziness and Generators in Python - Embracing laziness as a virtue in coding for efficiency. - Utilizing the yield keyword to create generator functions. - Generators help with large data sets, avoiding expensive memory allocation. 02:32 ⚙️ Enhancing Performance with Concurrency - Introducing concurrency using the multi-processing library. - Exploring the concept of embarrassingly parallel problems. - Demonstrating the efficiency gain through concurrent image processing. 03:14 🛠️ Code Compilation with Cython for Optimization - Using Cython to compile Python-like code to C for performance improvement. - Optimizing specific parts of the codebase, not a full replacement for Python. - Significant performance improvement demonstrated with a factorial calculation. 03:42 📚 Harnessing Compiled Libraries and Frameworks - Leveraging compiled libraries and frameworks like NumPy, Pandas, and Pillow. - Exploring performance benefits by tapping into C implementation. - Enhancing code readability while maintaining performance through these frameworks. 04:10 🚀 Boosting Speed with PyPy Interpreter - Introducing PyPy as an alternative Python interpreter for improved speed. - Just-in-time compilation method for forward-looking code compilation. - Consider benchmarking code with both PyPy and CPython for optimal performance.
This might be a bit nitpicky, but concurrency != parallelism. If you're using the multiprocessing library you're executing your code in a truly parallel fashion since it spawns new processes each with their own interpreter running, but you don't have to do that to run code concurrently. Asyncio and the threading library will both allow you to make your code concurrent, without needing new processes. If your tasks are largely io bound, asyncio or threads are usually a better choice, while multiprocessing is better for CPU bound processes (generalizing of course). Multiprocessing isn't always faster either. Depending on the number of threads and complexity of the problem, it might not be worth incurring the additional overhead to spawn new processes. And on top of all that you're adding complexity to your code. Concurrency/parallelism aren't easy. So all that is to say, it's a nuanced topic and might not be the best example of an easy or effective way to improve the performance of your code.
Once, one of my programm in python tooked 35min to process, a collegue ported this to a python rust-based librairy and the script was running in less than a second.
kids the language itself isn't slow. The thing is the optmization it's done by a interpreter or compiler. so the official python interpreter it's too slow, you can find some alternatives like numpy, cpython, numba, jython and so on.
Great ideas! I got some code that runs multiple dd commands and is dog-faced slow. I'll try some of these like pypy and cython to see if it increases the speed One thing that I do: instead of adding characters to a string I use a list and append items to that list and use " ".join(list_in) to create the string at the end. Ex. if you're using st1 = st1 + new_char and you're creating an 80 character line you'll have 3200 immutable characters because of all the strings. by using lst_st1.append(new_char) with a " ".join(lst_st1) at the end you have 160 immutable characters.
You have to be careful with multiprocessing though. Since Python is interpreted and e.g. CPython only offers a single Interpreter instance, multiprocessed code can actually be slower, because only a single Interpreter is performing the operations and context switches in between take time (see Global Interpreter Lock). It works well for IO bound operations though, like you showed in the video :)
I believe you're correct for Python threads, but multiprocessing actually forks new processes which have their own GIL (global interpretor lock). It can be slower however due to the overhead of creating more OS processes, and having to serialize and deserializd the data to those processes, so definitely worthwhile to check what is more performant!
0:30 - seems like you're creating new arrays for each partition (quick sort) but that defeats the purpose of quick sort which is supposed to be in place sorting algorithm 😢
Excellent video. I like a slow running environment during the development process because I want to learn how to make faster algorithms and code. Python is perfect for me just like BASIC and JavaScript. If my interpreted code can run efficiently then that’s a good indicator it will run well when compiled.
if you have code generating code in a build step, using pickling is sometimes much faster when you use it at runtime. I think it skips having to serialize it all again.
1:06 The amount of times I've decided to spend a few minutes writing a script to automate something, only to enter a fever dream of ideas and end up wasting an hour on optimizing and refactoring it for no productive reason :D
Not that I've looked too deeply into it, but the generators example seems wrong, it seems like the time save comes from not using .split in the generator version
Make use of set and dict which use internal hashing to improve lookups insanely! (I used to amount stuff in lists and looped over them ... that was SO bad!)
Possibly a dumb question, but is it not possible to just compile Python like you would any other language? Pre-interpret it, if you will? Speed isn't my biggest concern, as long as the machine does it faster than I can (I'm not a big, professional developer), so I've never really thought about it until now.
"like you would any other language". The approach varies wildly between languages. The stages from the language we implement it in, to the machine code that the CPU (or GPU) executes is rarely the same between languages, even seemingly similar languages like C and C++. There's stage after stage after stage (especially for C++), until it reaches a point that it can be translated to some flavor of assembly and then to machine code. The "easiest" way to compile python would be to transpile it to a language that you can compile. And then Python is just a frontend to that language. The second easiest would probably be to implement a way to translate Python into some flavor of LLVM intermediate representation (IR) and then compile it to machine code from there. That is an approach many languages take, such as Rust, Zig and dozens of esoteric languages. Now you need to define a Python standard that describes how you go from arbitrary valid Python to LLVM IR. It would probably require Python4, otherwise I assume it would already exist.
Great question! - List comprehensions arent lazy evaluated to my knowledge so they'll likely not be as performant. - I'm unsure about inline generators so would need to look those up to give a concrete answer.
Pypy made my code slower. Although my code was running in only a few hundreths of a second so.... maybe that was my fault for expecting it to be faster...
Haha. I should do a video on when to use pypy. It's mainly best for when you have code that is called multiple times. Such as in a loop. Otherwise the JIT compilation is wasteful.
@@dreamsofcode it was a parser. It would take a look at a list of tokens, compare it to some expected tokens, if the checks passed it would consume the tokens and replace them with a new one. That process would repeat until it got to the end. The input file was 200 lines, but i couldnt give you a number of tokens. It was probably a few hundred
🎯 Key Takeaways for quick navigation:
00:00 🚀 Leveraging Built-in Functions for Speed
- Using built-in functions from the standard library boosts Python's speed.
- Comparison of a custom sorting algorithm against the built-in sorted function.
- Built-in functions are often faster due to being written in C.
01:10 🔄 Laziness and Generators in Python
- Embracing laziness as a virtue in coding for efficiency.
- Utilizing the yield keyword to create generator functions.
- Generators help with large data sets, avoiding expensive memory allocation.
02:32 ⚙️ Enhancing Performance with Concurrency
- Introducing concurrency using the multi-processing library.
- Exploring the concept of embarrassingly parallel problems.
- Demonstrating the efficiency gain through concurrent image processing.
03:14 🛠️ Code Compilation with Cython for Optimization
- Using Cython to compile Python-like code to C for performance improvement.
- Optimizing specific parts of the codebase, not a full replacement for Python.
- Significant performance improvement demonstrated with a factorial calculation.
03:42 📚 Harnessing Compiled Libraries and Frameworks
- Leveraging compiled libraries and frameworks like NumPy, Pandas, and Pillow.
- Exploring performance benefits by tapping into C implementation.
- Enhancing code readability while maintaining performance through these frameworks.
04:10 🚀 Boosting Speed with PyPy Interpreter
- Introducing PyPy as an alternative Python interpreter for improved speed.
- Just-in-time compilation method for forward-looking code compilation.
- Consider benchmarking code with both PyPy and CPython for optimal performance.
Numba is really fast! The problem is it doesn't support objects, but is really good for small numerical functions
I should look into Numba! That's a great tip, thank you!
This might be a bit nitpicky, but concurrency != parallelism. If you're using the multiprocessing library you're executing your code in a truly parallel fashion since it spawns new processes each with their own interpreter running, but you don't have to do that to run code concurrently. Asyncio and the threading library will both allow you to make your code concurrent, without needing new processes. If your tasks are largely io bound, asyncio or threads are usually a better choice, while multiprocessing is better for CPU bound processes (generalizing of course).
Multiprocessing isn't always faster either. Depending on the number of threads and complexity of the problem, it might not be worth incurring the additional overhead to spawn new processes.
And on top of all that you're adding complexity to your code. Concurrency/parallelism aren't easy. So all that is to say, it's a nuanced topic and might not be the best example of an easy or effective way to improve the performance of your code.
"Concurrency is all about managing many tasks at once
Parallelism is about doing many tasks at once"
Once, one of my programm in python tooked 35min to process, a collegue ported this to a python rust-based librairy and the script was running in less than a second.
Polars ? 😂
Remember, folks: The secret to making Python run fast is to use as little Python as possible.
I use it for API. The speed is not at all an issue for me.
If nim had more community support it'd be a drop in replacement for Python
@@Cybergazer-n9o what is nim?
EXACTLY! I use it for the GUI with Qt because it is very simple and speed is not needed, but the core stuff I always send to compiled libraries.
kids the language itself isn't slow. The thing is the optmization it's done by a interpreter or compiler. so the official python interpreter it's too slow, you can find some alternatives like numpy, cpython, numba, jython and so on.
Great ideas! I got some code that runs multiple dd commands and is dog-faced slow. I'll try some of these like pypy and cython to see if it increases the speed
One thing that I do: instead of adding characters to a string I use a list and append items to that list and use " ".join(list_in) to create the string at the end.
Ex. if you're using st1 = st1 + new_char and you're creating an 80 character line you'll have 3200 immutable characters because of all the strings.
by using lst_st1.append(new_char) with a " ".join(lst_st1) at the end you have 160 immutable characters.
And even better if you can generate the list by comprehension instead of appending.
See also io.StringIO. I haven't done a direct comparison, but it always seems quick.
You have to be careful with multiprocessing though. Since Python is interpreted and e.g. CPython only offers a single Interpreter instance, multiprocessed code can actually be slower, because only a single Interpreter is performing the operations and context switches in between take time (see Global Interpreter Lock). It works well for IO bound operations though, like you showed in the video :)
I believe you're correct for Python threads, but multiprocessing actually forks new processes which have their own GIL (global interpretor lock). It can be slower however due to the overhead of creating more OS processes, and having to serialize and deserializd the data to those processes, so definitely worthwhile to check what is more performant!
Nice video. Nice Vim editor too. Thanks!
0:30 - seems like you're creating new arrays for each partition (quick sort) but that defeats the purpose of quick sort which is supposed to be in place sorting algorithm 😢
So he actually is bad at writing code. Welp.
Excellent video. I like a slow running environment during the development process because I want to learn how to make faster algorithms and code. Python is perfect for me just like BASIC and JavaScript. If my interpreted code can run efficiently then that’s a good indicator it will run well when compiled.
if you have code generating code in a build step, using pickling is sometimes much faster when you use it at runtime. I think it skips having to serialize it all again.
1:06 The amount of times I've decided to spend a few minutes writing a script to automate something, only to enter a fever dream of ideas and end up wasting an hour on optimizing and refactoring it for no productive reason :D
Python 3.13 now has an experimental JIT compiler. Maybe it will be fully supported for 3.14?
I think everyone hopes so
Not that I've looked too deeply into it, but the generators example seems wrong, it seems like the time save comes from not using .split in the generator version
Make use of set and dict which use internal hashing to improve lookups insanely!
(I used to amount stuff in lists and looped over them ... that was SO bad!)
I actually have a video about this coming out!
@@dreamsofcode Still planning to release it someday?
@@pingmetal Haha I actually still have this video in my backlog. It's evolved somewhat since the original conception however.
0:30 Don’t want to burst your bubble, buddy, but that’s GPT 3 levels of QS…
How does QT generate the binaries?
Possibly a dumb question, but is it not possible to just compile Python like you would any other language? Pre-interpret it, if you will? Speed isn't my biggest concern, as long as the machine does it faster than I can (I'm not a big, professional developer), so I've never really thought about it until now.
"like you would any other language". The approach varies wildly between languages. The stages from the language we implement it in, to the machine code that the CPU (or GPU) executes is rarely the same between languages, even seemingly similar languages like C and C++. There's stage after stage after stage (especially for C++), until it reaches a point that it can be translated to some flavor of assembly and then to machine code.
The "easiest" way to compile python would be to transpile it to a language that you can compile. And then Python is just a frontend to that language. The second easiest would probably be to implement a way to translate Python into some flavor of LLVM intermediate representation (IR) and then compile it to machine code from there. That is an approach many languages take, such as Rust, Zig and dozens of esoteric languages. Now you need to define a Python standard that describes how you go from arbitrary valid Python to LLVM IR. It would probably require Python4, otherwise I assume it would already exist.
You can use Cython 3.1.
should I replace asyncio with multiprocessing?
You can also use concurrent.futures which is very easy to use.
I find it much easier to use joblib in embarassingly parallel problems than multiprocessing
Which is the fastest among these?:
- Yield generators
- Inline generators
- List comprehensions
Great question!
- List comprehensions arent lazy evaluated to my knowledge so they'll likely not be as performant.
- I'm unsure about inline generators so would need to look those up to give a concrete answer.
List comprehensions is the fastest in general. but it may vary based on your code and other workflow and objects.
As expected: Turn Python fast by using other languages.
which is one of the core concepts of the language
If I am scripting in python for something slow I will just add python bindings to my old c or rust code.
Using mmap to create a hash from a file. This is not much faster than the approach with a buffer.
How about using maturin and Rust for those intense operations
Don't tempt me. I've got a video on making python faster with rust 😅
I like the representation
idc about the speed, i just don't like the syntax and the way language works
Still waiting for PyPy
The best method to improve speed of Python code is to use C++.
Hahah or C. There's also some other languages one can use 😉
@@dreamsofcode eg Rust! 😉
nimpy and nimporter for using Nim with Python is also good.
Secret number one: use c++
Hahaha
0:45
Pypy made my code slower. Although my code was running in only a few hundreths of a second so.... maybe that was my fault for expecting it to be faster...
Haha. I should do a video on when to use pypy. It's mainly best for when you have code that is called multiple times. Such as in a loop. Otherwise the JIT compilation is wasteful.
@@dreamsofcode it looped thru 1.5k iterations
Interesting, what was the code doing in those iterations?
@@dreamsofcode it was a parser. It would take a look at a list of tokens, compare it to some expected tokens, if the checks passed it would consume the tokens and replace them with a new one. That process would repeat until it got to the end.
The input file was 200 lines, but i couldnt give you a number of tokens. It was probably a few hundred
Thank you for sharing. I've got a video planned with Pypy in the future so I'll add parsing + lexing as a benchmarking case!
poderoso
132 Abshire Loop
Python is just slow if you don't know how to use it.
pure python is slow but you never meant to use "pure python" and that why the Standard library exist.
The speed isn't the issue. Python is so hard to code in, the syntax is so ugly
Mixed concurrency with parallelism... sorry, I'm out of here