Doug Mercer
Doug Mercer
  • 13
  • 606 863
How Fast can Python Parse 1 Billion Rows of Data?
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/DougMercer .
You’ll also get 20% off an annual premium subscription.
-------------------------------
Sign up for 1-on-1 coaching at dougmercer.dev
-------------------------------
The 1 billion row challenge is a fun challenge exploring how quickly we can parse a large text file and compute some summary statistics. The coding community created some amazingly clever solutions.
In this video, I walk through some of the top strategies for writing highly performant code in Python. I start with the simplest possible approach, and work my way through JIT compilation, multiprocessing, and memory mapping. By the end, I have a pure Python implementation that is only one order of magnitude slower than the highly optimized Java challenge winner.
On top of that, I show two much simpler, but just as performant solutions that use the polars dataframe library and duckdb (in memory SQL database). In practice, you should use these, cause they are incredibly fast and easy to use.
If you want to take a stab at speeding things up further, you can find the code here github.com/dougmercer-yt/1brc.
References
------------------
Main challenge - github.com/gunnarmorling/1brc
Ifnesi - github.com/ifnesi/1brc/tree/main
Booty - github.com/booty/ruby-1-billion/
Danny van Kooten C solution blog post - www.dannyvankooten.com/blog/2024/1brc/
Awesome duckdb blog post - rmoff.net/2024/01/03/1%EF%B8%8F%E2%83%A3%EF%B8%8F-1brc-in-sql-with-duckdb/
pypy vs Cpython duel blog post - jszafran.dev/posts/how-pypy-impacts-the-performance-1br-challenge/
Chapters
----------------
0:00 Intro
1:09 Let's start simple
2:55 Let's make it fast
10:48 Third party libraries
13:17 But what about Java or C?
14:17 Sponsor
16:04 Outro
Music
----------
"4" by HOME, released under CC BY 3.0 DEED, home96.bandcamp.com/album/resting-state
Go buy their music!
Disclosure
-----------------
This video was sponsored by Brilliant.
#python #datascience #pypy #polars #duckdb #1brc
Переглядів: 145 536

Відео

STOP Using Plain Python Scripts! Do this instead (5 reasons)
Переглядів 13 тис.5 місяців тому
Sign up for the totally free tier of Prefect Cloud here: prefec.tv/doug-mercer Sign up for 1-on-1 coaching at dougmercer.dev One of the most frustrating parts of the workday is doing something that you know could be automated, but just… isn’t yet. In this video, we use Prefect to schedule a Python script to run every week. After, we find that scheduling was really only the first of five problem...
Write Python code people actually want to use
Переглядів 12 тис.9 місяців тому
Sign up for 1-on-1 coaching at dougmercer.dev Writing code that "works" shouldn't be your end goal. It's really just the start... In this video, we adapt a clumsy, non-Pythonic API into an easy to use, easy to understand Pythonic one. We use magic methods such as getitem , len , enter , and exit to make our objects a context manager and support the len() function and square bracket indexing. An...
Compiled Python is FAST
Переглядів 86 тис.11 місяців тому
Sign up for 1-on-1 coaching at dougmercer.dev Python has a bit of a reputation fast to write, but slow to run. In this video, we focus on a simple to understand dynamic programming problem that would be terribly slow in native Python or numpy. We show that Python can achieve (and actually exceed) C level performance with the help of just-in-time and ahead-of-time compilers such as mypyc, Cython...
ChatGPT made an iPhone app for me
Переглядів 21 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev ChatGPT and I create an IOS SwiftUI app, but I'm just the idea guy. Since I have no idea how to write Swift, ChatGPT helps me write an basic fitness app with pretty interesting animations without even knowing the language. #chatgpt #ios #swift Chapters 0:00 Intro 0:17 Hello World 1:04 Buttons and Counters 2:25 Title and Success Indicator 3:26 Fixing...
Never use PowerPoint again
Переглядів 285 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev In this video, we use Marp to create a presentation from our text editor using Markdown syntax. Marp is an excellent way of quickly creating beautiful presentations with syntax highlighted code blocks, TeX typeset math, and stylized images without having to worry about a finicky WYSIWYG editor like PowerPoint. #markdown #marp #marpit I use Marp a lo...
What's new in Python 3.11?
Переглядів 1,4 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev Python 3.11 is out! In this video, I show you some of the cool features and improvements made in this release. Release notes - docs.python.org/3.11/whatsnew/3.11.html Other notes: - At the time of this video's release, MyPy doesn't support most of the new typing features. If you're eager to use them right away, consider using pyright. - pyperformanc...
Scary good voice cloning | Python Text to Speech
Переглядів 6 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev In this video, Doug is lazy and lets TorToiSe text-to-speech do his voice-over for him. Specifically, we create 11 custom voices and illustrate how they sound while teaching you how to do the same. We also discuss some of the ethical considerations of such a powerful model, and highlight the precautions taken by the TorToiSe author. Thanks (and sorr...
OpenAI Whisper: Incredible Automatic Speech to Text!
Переглядів 10 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev Whisper, OpenAI's new automatic speech recognition model, is *awesome*. In this video, I show you how to use it and present a few interesting examples of transcribing English and translating spoken German and Japanese to English. github.com/openai/whisper #machinelearning #ai #openai #nlp #python Chapters 00:00 Introduction 00:22 Installing Whisper ...
The Magic that Makes Python Tick
Переглядів 2,4 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev In this video, we implement Python's range object in pure Python. We use this as a case study to explore Python's data model and learn how to create rich objects that interact naturally with Python's built-in operators and functions. #python #objectorientedprogramming Chapters 0:00 Introduction 1:02 What does range() do? 2:17 How will we test our im...
Your Python code is almost entirely untested.
Переглядів 1,8 тис.Рік тому
Sign up for 1-on-1 coaching at dougmercer.dev Property-based testing is an alternative to example-based testing where tests verify that properties of the output are satisfied given randomly generated input. This randomized approach allows us to more thoroughly test against a broad range of input and can detect hard-to-find bugs better than example-based testing. In this video, we discuss the mo...

КОМЕНТАРІ

  • @gawwad4073
    @gawwad4073 21 годину тому

    Nice video. VERY good writing and editing. Smooth as hell, keep it up!

  • @tangerie1284
    @tangerie1284 21 годину тому

    I wonder how fast an AOT compiler like nuitka would make it

    • @dougmercer
      @dougmercer 16 годин тому

      Yeah, I wonder that too. I'm also curious if anyone has a Cython solution...

  • @affrokilla
    @affrokilla День тому

    A 50x speedup between a for loop and a polars dataframe is really significant, great video!

    • @dougmercer
      @dougmercer 16 годин тому

      Polars/Duckdb are crazy fast. Thanks for watching!

  • @user-uc6wo1lc7t
    @user-uc6wo1lc7t 2 дні тому

    Well... Your python code is not well "optimized" too =) 1. a, b - np.array, so you use numpy in your pure python code (that is not fair). Moreover, switching to lists gives you speedup (I'm shocked too). But, maybe you did so... You did not show us "all" code. 2. max of two elements is not faster than pure if-else. And if you use it - you need to extract dp[prev_i, j] and dp[i, prev_j] to variables to not duplicate accessing to elements. 3. Your i-1 and j-1 can be evaluated in corresponding loops and saved to variables. The same argument for len(a|b) + 1. Can be evaluated once. With all these I could speedup pure python up to +70% performance on my Ryzen 5 3600X. And it was consistant through N = 3_000, N = 10_000, so all my tests with numpy was with N = 3_000 to not waste too much time. Therefore optimized python code faster than not optimized numpy code by ~82.2%. If you apply 2) and 3) optimizations to numpy version, optimized python code faster by ~77.5%. I even tried different logic for numpy and python versions where dp is 1d array, but python version became slower, while numpy version made some poor progress (optimized numpy with different logic faster than optimized numpy by ~5%). Well, it was intresting. Never expirienced numpy code slower than pure python. Maybe because operations are so fast that switching to numpy types make it worse...

    • @dougmercer
      @dougmercer 2 дні тому

      I didn't use numpy in the pure Python implementation... See the code at 1:59. That's why I mentioned numpy was surprisingly so much slower in the numpy section. As for the other stuff... You're right that there might be other small tweaks you could make to the various implementations. Thanks for pointing them out

    • @user-uc6wo1lc7t
      @user-uc6wo1lc7t 2 дні тому

      ​@@dougmercer From the timestamp you gave I can't deduct if a,b are lists or np.array s because they were created by numpy functions as you have shown at 2:10 . But I'm pretty sure you DID convert a, b to lists, but it could be actually seen when you used mypy and created type hints. I just forgot about it because I was fully invested in pure python and numpy. Sorry, I didn't mean to say your benchamrks are bad, they are actually pretty good. Your video is a lifechanger for me. Previously, I used numpy without any hesitation, even in loops that doesn't leverage vectorization. And, strangly, numpy ave me speedup. So I tried to code your example in disbelive. But, undoubtfully, your code is a good example that I was wrong. But that leaves me with questions how could I get speedup in my previous code...? Maybe I was useing vectorization... Can't remember. Seems like I'm gonna benchmark my old code and test it against pure python implementation. Recently I watched one guy who was benchmarking nested loops. He was trying to proove that smaller outer loops gives +30% performance. But he didn't even realized what exactly he was benchmarking and fooled a lot of people and ignoring me even I gave him math proof of results he got. And conditions to get 30+% boost are extremely vague. So I lost my beliefs in such type of videos. But your... is really good. Thanks a lot!

    • @dougmercer
      @dougmercer 2 дні тому

      @@user-uc6wo1lc7t Ah I see what you're saying. I believe I coerced them to lists but I'm not 100% sure. I'll have to double check later. And definitely agree- I always reach for numpy, and it's surprising when it doesn't help! In this case, indexing into the arrays to do a lot of scalar operations is slower. We typically get the speed up when we can do vectorization. Thanks for watching and the thoughtful comments =]

  • @UndyingEDM
    @UndyingEDM 2 дні тому

    The video editing is top notch too!

  • @Slightlymisshapen
    @Slightlymisshapen 2 дні тому

    Beautiful presentation. I love your descriptions on libraries. The illustrative code break down puts context and concrete examples on what would otherwise be yet more abstract documentation. It helps to grasp the utility of the libraries uou describe not just in this video but all of them. Fantastic work.

    • @dougmercer
      @dougmercer 2 дні тому

      Aw, thanks! glad you enjoyed it, and thanks for your super nice comment =]

  • @silloo2072
    @silloo2072 2 дні тому

    Why no c++ optimisation

    • @dougmercer
      @dougmercer 2 дні тому

      I did -O3 compile the C++, but this channel is mostly focused on Python devs. So the idea is, what would someone unfamiliar with C++ write... That said, structuring as a 1D array does achieve some speed up, but still only slightly faster than Numba/Taichi , so it doesn't really change the story

  • @ogrp5777
    @ogrp5777 3 дні тому

    What about pypy?

    • @dougmercer
      @dougmercer 3 дні тому

      I covered PyPy in my latest video ("How fast can Python parse 1 billion rows of data")

    • @ogrp5777
      @ogrp5777 3 дні тому

      @@dougmercer I just saw the video, that was brilliant! Thanks

  • @ali.moumneh
    @ali.moumneh 5 днів тому

    your video quality is top notch, im sure it will soon equate to video views if you keep this up, good luck.

    • @dougmercer
      @dougmercer 4 дні тому

      Thanks! I hope so too 🤞

  • @DepressedMusicEnjoyer
    @DepressedMusicEnjoyer 6 днів тому

    As much as I like the video the 6 times slower not being a problem kinda drives me crazy as a person doing low level coding 😭 Like yeah you’re right for a lot of stuff it doesn’t matter, but then imagine 5 vs 30 fps, and then if it’s nested within another thing you may get exponentially slower. I have been trying to make a 480mhz mcus cpu do rendering and it’s difficult to get 30 fps ish and well if using python would mean 6 times slower I woudlnt be happy no matter how much easier my code would be to understand

    • @dougmercer
      @dougmercer 6 днів тому

      That's fair. Some things are worth squeezing all the performance out It will be nice when PyPy supports sub-interpreters or no-GIL... it will eventually be possible to close the gap more

  • @0MVR_0
    @0MVR_0 6 днів тому

    should not have watched this video, considering the fact that I have been waiting two days for my implementation of kendal's Tau to finish,

    • @dougmercer
      @dougmercer 6 днів тому

      Oh no 💀 Numba + numpy or PyPy would probably help

  • @rverm1000
    @rverm1000 10 днів тому

    That's nice of you to point these libraries out.

  • @bbajr
    @bbajr 10 днів тому

    is pandas slow?

    • @dougmercer
      @dougmercer 10 днів тому

      It would be for something like this! Using this pandas implementation github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, but polars takes 11-12s

  • @gencurrent
    @gencurrent 11 днів тому

    It's hard to watch this with all the whistles and blowers and another hangers, glitters and so on. Kindly, stop using those! The interesting video turns into a torture session!!

    • @dougmercer
      @dougmercer 11 днів тому

      Still trying to find my style/voice ¯\_(ツ)_/¯ Check out my two more recent videos. They have less intrusive editing

    • @gencurrent
      @gencurrent 10 днів тому

      @@dougmercer Thank you Doug : )

  • @joaoguerreiro9403
    @joaoguerreiro9403 11 днів тому

    Damn, this was an instant follow! Hope to see more computer science content 🙏🏼 Great video :)

  • @Jp-ue8xz
    @Jp-ue8xz 14 днів тому

    c++ -O3 flag: am i a joke to you?

    • @dougmercer
      @dougmercer 14 днів тому

      It was -O3 optimized gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3

  • @RiadAhmed-ce6qo
    @RiadAhmed-ce6qo 15 днів тому

    python use third party compiler as well to execute fast.

  • @MaxiveLegend
    @MaxiveLegend 15 днів тому

    1. "Some of these solutions are gonna be as fast or even faster than my C++ implementation" just means your C++ is bad lol. The python interpreter was built in C, so by definition a python program can never be faster than , or even as fast as C. C++ is SLIGHTLY slower than C, but python would never fall in between them. 2. Making python "faster" by transpiling it to C isn't making PYTHON faster, it's just converting it to C. It's basically like saying "I'm going to upgrade my Renault Twingo" and then getting a Bugatti Veyron

  • @smithdoesstuff
    @smithdoesstuff 16 днів тому

    Kinda bummed I wasn’t sub 10k

    • @dougmercer
      @dougmercer 16 днів тому

      Hah! I'm at 9,991 subs, so good news! Hopefully crossing 10k mark soon =] 🤞

  • @CottidaeSEA
    @CottidaeSEA 17 днів тому

    I shit on Python a lot for being slow, but honestly, 8-10 seconds to read 1 billion rows is sufficient in most scenarios.

  • @joseduarte9823
    @joseduarte9823 19 днів тому

    Depending on how large the total sum actually is, using an incremental mean may yield better performance since python won’t need to upgrade the number to a big int

    • @dougmercer
      @dougmercer 19 днів тому

      Neat idea... It's worth a shot! Feel free to fork the repo and give it a try

  • @nikilragav
    @nikilragav 20 днів тому

    are you allowed to use numpy or gpu (torch, cupy, etc)

    • @dougmercer
      @dougmercer 20 днів тому

      You could use numpy, but I don't think it would help (the bottleneck is reading the data in). I did see some use Dask + cuDF (CUDA) and that was very fast. However, it wasn't allowed in the challenge because the evaluation system didn't have a GPU

    • @nikilragav
      @nikilragav 20 днів тому

      @@dougmercer ah. Reminds me of another challenge I saw where IO is the bottleneck. Even there I'm wondering if writing the content to the GPU memory and back is too slow

  • @cottawalla
    @cottawalla 20 днів тому

    I couldn't get your opening "performance critical python" out of my head and so missed the entire rest of the video.

  • @jmidski5753
    @jmidski5753 21 день тому

    I think it's 60x faster not 100x. We don't have 100 seconds in a minute. Great video though!

    • @dougmercer
      @dougmercer 21 день тому

      The clock visualization is a little confusing but... 256 seconds 2.56 seconds 100x difference

  • @BarafuAlbino
    @BarafuAlbino 21 день тому

    Pypy? Nuitka?

    • @dougmercer
      @dougmercer 21 день тому

      I tackled PyPy in my latest video if you're interested ("How fast can Python parse 1 billion rows of data")

  • @user-pg9nf2vq8s
    @user-pg9nf2vq8s 22 дні тому

    i would never use python, but i like watching how people optimize the hell out of something.

    • @dougmercer
      @dougmercer 22 дні тому

      There's something Zen about it 🧘

  • @ThisRussellBrand
    @ThisRussellBrand 22 дні тому

    Beautifully done!

  • @ThisRussellBrand
    @ThisRussellBrand 22 дні тому

    This is a beautiful explanation. Thank you for sharing it.

  • @alexnolfi3730
    @alexnolfi3730 23 дні тому

    did you test out pandas to see how much slower it was than polars?

    • @dougmercer
      @dougmercer 23 дні тому

      It's way slower Using the pandas implementation in here github.com/Butch78/1BillionRowChallenge/blob/main/python_1brc%2Fmain.py takes about 150s, whereas the polars implementation takes 11-12s

  • @GRHmedia
    @GRHmedia 23 дні тому

    Can't really say you are just relying on python at that point. Well guess you can't say that to start with. Cpython runs on a JIT that is programmed in C to start with. If python is so great why don't they create a compiler with it to create a JIT that made from python that runs python.

    • @dougmercer
      @dougmercer 23 дні тому

      My focus on this video was libraries/approaches that let you either *write* python or interoperate with Python very seamlessly. I don't really care that Taichi or numba dips to LLVM. As for compilers written in Python... PyPy's JIT is written in RPython and compiles to C ¯\_(ツ)_/¯

  • @mohak9102
    @mohak9102 24 дні тому

    How is this better than airflow?

    • @dougmercer
      @dougmercer 23 дні тому

      "better" is probably a matter of taste. I prefer it because it feels more like writing Python and less like writing a config file. There are several comparisons out there that dive into the differences. Here's a third party link comparing them + another approach (Argo) neptune.ai/blog/argo-vs-airflow-vs-prefect-differences

  • @mahdirostami7034
    @mahdirostami7034 24 дні тому

    8:53 I cannot believe writing a parser would gain any performance considering the default one is probably implemented in C. I assume this is a case of pypy optimizing while running and I'm wondering if running this same script with cpython would result in worse performance.

    • @dougmercer
      @dougmercer 23 дні тому

      I expect that the custom parser would have worse performance in plain CPython, but I didn't test it

  • @rushcoc9605
    @rushcoc9605 24 дні тому

    😮😅can you tell me how to measure how much time it take which part of code just by looking how ?

    • @dougmercer
      @dougmercer 23 дні тому

      1. A bit of intuition 2. A bit of being totally wrong (removing the constants that indicated which column was min, max, count, sum didnt speed up performance... I was trying too many things at once and accidentally bundled that change in with something else) 3. I used a lot of time.perf_counter() to measure that time that certain operations took and A/B tested them Normally, in CPython, I would typically profile using something like PyInstrument or other similar line-level profilers

  • @maurolimaok
    @maurolimaok 24 дні тому

    Nice channel. Hope it grows.

    • @dougmercer
      @dougmercer 24 дні тому

      Thanks Mauro! I hope so too🤞

  • @nangld
    @nangld 24 дні тому

    Instead of putting so much effort optimizing python, why not just use C/C++? Also multiple CPUs wont help, unless you have RAID0

    • @dougmercer
      @dougmercer 24 дні тому

      Multiple cores help even for C/C++

  • @this-one
    @this-one 25 днів тому

    Would it count as Python if we write it as a module in C?

    • @dougmercer
      @dougmercer 25 днів тому

      I'm no philosopher, but this gives me ship of theseus vibes. so... maybe technically but I don't feel good about it

  • @Pritam252
    @Pritam252 25 днів тому

    Does this video's measurements get affected by Disk speed?

    • @Pritam252
      @Pritam252 25 днів тому

      (I guess every experiment will always have some slight noise)

    • @dougmercer
      @dougmercer 25 днів тому

      Oh definitely. The fact that I have a lot of RAM and have a solid state drive also make this much faster than if I had very little RAM and only a HDD. I also had some background processes running, which add a bit of noise to the measurements That said, I think my system is roughly on par specs wise with the challenge's evaluation system

  • @vlc-cosplayer
    @vlc-cosplayer 25 днів тому

    Me, approaching this as an engineer: - Read a random subset of the data - Do the computation on that - Yeah that's close enough lmao, interpolation will take care of missing values

    • @dougmercer
      @dougmercer 25 днів тому

      Hah! Working smarter not harder 🚀

    • @vlc-cosplayer
      @vlc-cosplayer 25 днів тому

      @@dougmercer I remembered a video I watched about HyperLogLog. When working with extremely large datasets, a fast approximation may be more desirable than getting the correct answer, but only after a long time. 👀 It'd be interesting to measure how good an approximation you can get using only a fraction of the data. E.g., would using 10% of the data get 90% of the way to the correct answer? You probably don't need 100% accuracy all the time. In fact, your data may not even be 100% accurate to begin with! To put the cost of precision in perspective, getting 99% uptime is relatively easy (that's 80 hours of downtime/year), but every additional 9 after that becomes exponentially more expensive. 99.9% is 8 hours. 99.99% is only 1 hour, 99.999% is only 5 minutes, go to the bathroom and you'll miss that. 💀

  • @M-Wulff
    @M-Wulff 26 днів тому

    "Dont write tests. Write test writers"

  • @dinoscheidt
    @dinoscheidt 26 днів тому

    Looks like a python only alternative to Kestra - and still no sight of the software engineering practice to maintain build and ETL pipelines next to code in a CICD fashion… 😮‍💨

  • @irnehhenri
    @irnehhenri 26 днів тому

    I've been using the pandas library, but it's given me a lot of trouble with a ~10GB dataset I have - so thanks for mentioning Polars and DuckDB, looks like both are ~10x faster with less memory usage. Tho I'm not sure if I appreciate SQL enough to go for DuckDB - I've been working with ClickHouse recently and even with the slowness of pandas, I end up spending way more time debugging my SQL queries. So that makes me feel like SQL is only worth the effort/time if it's actually going to be running often on the DB server, not for single-use processing scripts

    • @irnehhenri
      @irnehhenri 26 днів тому

      PS I found this nice benchmark, but idk how to not get my comment deleted by youtube when posting it so you're gonna have to google "duckdblabs Database-like ops benchmark"

  • @markconnolly7907
    @markconnolly7907 27 днів тому

    Could you go into how you wrote the tests more?

    • @dougmercer
      @dougmercer 27 днів тому

      I have video on the testing library I used, hypothesis -- "Your Code is Almost Entirely Untested"

  • @anneallison6402
    @anneallison6402 27 днів тому

    How do you do the code animations?

    • @dougmercer
      @dougmercer 27 днів тому

      I've used two different approaches for animating code. 1. In my early videos I used the `manim` library. The community edition has a Code object. 2. In recent videos, I created a custom Pygments formatter that outputs the syntax highlighted code as a Davinci Resolve Fusion composition. Both approaches have a lot of problems. I'm currently writing my own animation library. I may make a video about it soon (but I would probably not be open sourcing the code) Another option you may find useful is reveal.js . That let's you write code animations in JavaScript, and even has an 'autoanimate" feature that works OK. However, since that's more for live presentations, you would need to screen record if you wanted to make a video

  • @Rgmenkera
    @Rgmenkera 28 днів тому

    Whats wrong with linux chronjobs ? if that malfunctions, the script is a non issue... since your OS will fail. as for the script, why do you call external functions raw like that in the main file, python itself has error handling and logging out of the box ! scheduling just another chronjob to mail reports can also be done .. but i don't like this, these "reports" can easily get out of hand if you don't have a centralized organized reporting system - pref nothing if nothing is wrong only if smt is wrong. The problem i have with this app is (1) if the startup goes down, or changes thier TOS, your company goes down with it - if a security bug was found after it went down your non-cloud stuff is also at risk (2) onboarding a new employee will be harder ; one more obscure framework for them to learn. ontop of all the company stuff. (3) IF stuff goes wrong don't just dump it into a file, or "prefex log" , get it fixed. if the logs are running out of hand, its symptoms of a bigger systems problem that you shouldn't just "patch up" . fix the code don't create extra procedure to be able to not fix it. Errors are your friends, silent failures are the worst. better read them, they will tell you whats wrong and even where :) (4) if the prefix lib needs to be updated, you need to update the code. if you never update it you risk (1) and also things like SSL change so it may work and suddenly not one day... its all extra work . (5) any time you add complexity, youre adding more points of potential failure. it would seem like a nightmare to me if prefix has an error... what if it cant parse some non uni-code characters or smt obscure you didnt think about... often keeping things simple is better. nice video though just some thoughts why im not convinced this is a good idea...just kicking the can down the road.

  • @user-zi2zv1jo7g
    @user-zi2zv1jo7g 28 днів тому

    I dont know too much about C++, but I suspect that if some library is automatically making your python code faster than the C++ equivalent you very likely have a C++ skill issue

  • @danielot
    @danielot 28 днів тому

    Talented but definitely need a senior engineer or a tech lead. There are a lot of ways to solve all of these problems (khm airflow, mlflow😂) Was this a real situation or just made up to help illustrate why this is in fact a great library?

    • @dougmercer
      @dougmercer 28 днів тому

      Yeah, there are a lot of ways to solve these sort of problems. We chose this and it worked well for us ¯\_(ツ)_/¯

  • @gevilin153
    @gevilin153 29 днів тому

    You can't put "performance critical code" and "python" in one sentence

    • @dougmercer
      @dougmercer 29 днів тому

      But I did, tho... 🙃

  • @HansBezemer
    @HansBezemer 29 днів тому

    gcc -O3 -o subseq subseq.c

    • @dougmercer
      @dougmercer 29 днів тому

      I did compile with O3 gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3