don't lru_cache methods! (intermediate) anthony explains

Поділитися
Вставка
  • Опубліковано 8 вер 2024
  • today I show a common pitfall with `lru_cache` and how it will almost always be a memory leak if used on a method!
    - what is lru_cache: • python: functools.lru_...
    - what is a decorator: • python @decorators - (...
    - pytest lru_cache performance regression: • pathlib is slow! false...
    playlist: • anthony explains
    ==========
    twitch: / anthonywritescode
    dicsord: / discord
    twitter: / codewithanthony
    github: github.com/aso...
    stream github: github.com/ant...
    I won't ask for subscriptions / likes / comments in videos but it really helps the channel. If you have any suggestions or things you'd like to see please comment below!

КОМЕНТАРІ • 67

  • @ponysmallhorse
    @ponysmallhorse 2 роки тому +18

    THANK YOU!!!! Found a memory leak in very old script.

  • @magnuscarlsson6785
    @magnuscarlsson6785 2 роки тому +11

    Another great video!
    And a special thanks for showing why the unexpected things happens, like how the _ keeps the garbage collector away.
    I had forgotten about this when viewing so I was right there with you ;-)

    • @anthonywritescode
      @anthonywritescode  2 роки тому +3

      yep -- I have another video about that as well: ua-cam.com/video/VKz1aQbNnyI/v-deo.html

  • @cmyuii
    @cmyuii 2 роки тому +5

    wow that's a sneaky one - simple but something i hadn't considered - cheers for fixing my code yet again!

  • @kvetter
    @kvetter Рік тому +12

    Your example has another problem with @cache on a method: if you change the value of self.y, then the cached value will be incorrect.

  • @rdean150
    @rdean150 Рік тому +4

    Clever solution to assign the function as an instance variable in the __init__. I usually just take the other approach you describe - define the decorated function at module level and have the instance method call it with the relevant attribute values. Your other approach is clever but given the shared nature of the module-level function solution, it seems the simplest solution is probably still the way to go. Not to mention that it helps ensure my primarily-dotnet-coding teammates can still understand what the code is doing. *sigh*
    Anyway, thanks for the good tip. I appreciate that you cover advanced Python topics. Most coding channels seem to cater primarily to beginners, which makes sense but is disappointing for the folks with some years under their belt already.
    Cheers!

    • @anthonywritescode
      @anthonywritescode  Рік тому +2

      the problem with your approach is the cached value outlives the lifetime of the object instance -- there's a very specific reason to make it instance-cached

    • @rdean150
      @rdean150 Рік тому

      @@anthonywritescode But if the cache keys depend only on simple values, the fact that the cached value outlives whatever object may have originally requested it is not a problem at all. Just because object_a requested a value originally, it does not mean that cached value is unique to object_a or that object_b cannot request the same value even after object_a has been garbage collected. As long as you are requesting the values via
      memoizedfunc(object_a.val1, val2)
      rather than
      memoizedfunc(object_a, val2)
      Then the lifespan of object_a is not particularly relevant.
      If the values ARE unique to the individual object that requested it, then yeah it doesn't make sense to use a shared module-level cache, as there will never actually be any sharing of the values that cache contains, and it may end up evicting values even while they are still needed if you set a maxsize on the cache. Which it sounds like may have been the situation in your use-case and definitely a consideration when applying these principles. So, fair point!

    • @anthonywritescode
      @anthonywritescode  Рік тому +1

      your situation sounds more like it shouldn't have been a method to begin with

    • @rdean150
      @rdean150 Рік тому

      @@anthonywritescode Lol yeah that's true, and I didn't write it as such. But given the simplicity of the caching decorators, it is certainly easy to imagine people just slapping the decorator on a method anyway without thinking for very long about it. After all, if they had thought about it much, they would have recognized the reference count implications of self being used in a cache key also. Assuming that they understood how the decorators work, of course. Which may or may not be more of a stretch than the developer thinking about whether the cached values could/should be used by other instances. TBH I'm not sure which is more likely.

  • @jbrnds
    @jbrnds 2 роки тому +2

    So factoring out the compute function as a seperate function `compute(y, x)` and decorate that with the lru_cache will work correctly and speed up even across instances. The `C.compute(x)` method will just return the `compute(self.y, x)`.

    • @anthonywritescode
      @anthonywritescode  2 роки тому +2

      yes, that is precisely what I said in the video

    • @jbrnds
      @jbrnds 2 роки тому +2

      @@anthonywritescode perfect! Just wanted to be sure if i understood correctly. Thanks for the great videos you are always making and your humble energy. You are a great explainer and i use your maintained projects daily. A deep bow.

  • @sparkyb6
    @sparkyb6 2 роки тому +6

    At the end when you mentioned creating an object pool and doing some magic in __new__, I wondered whether I could also just stick a lru_cache on __new__ to do that. It worked, but I had to move the initialization also inside __new__, because if I just call the superclass __new__ and leave initialization to __init__, even though __new__ will return the same object each time (for the same y), it will re-call __init__ on it and replace that inner lru_cache (self.compute). Just thought that was interesting.

    • @sparkyb6
      @sparkyb6 2 роки тому +2

      apologies for what UA-cam did to my underscores

    • @anthonywritescode
      @anthonywritescode  2 роки тому +1

      lol yeah youtube really hates underscores. the `__new__` / `__init__` thing is kind of annoying, I haven't found a nice way to work around it yet :(

    • @yehoshualevine
      @yehoshualevine Рік тому

      @@anthonywritescode ___init___ with triple underscore to cause youtube to print two (and understand the 3rd as markdown)

  • @tobb10001
    @tobb10001 2 роки тому +5

    Another solution would be to create a static function to do the computation and put @lru_cache on that. So the actual method would only pass the needed values to the cached function instead of the whole object.
    Would allow sharing cache between multiple instances, but would remove the ability to modify the object.
    Or am I missing something completely here? 😅

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      yep that would work -- and is one of the alternatives I outlined in the video

    • @tobb10001
      @tobb10001 2 роки тому

      @@anthonywritescode than I must have missed it, my bad. 😄

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      heh yeah the subtlety being that a static class function and a module function aren't really any different :)

  • @alexandreboisselet8336
    @alexandreboisselet8336 2 роки тому +4

    Thank you for the great explanation 🙏
    How well does caching work when the compute has **kwargs?

    • @anthonywritescode
      @anthonywritescode  2 роки тому +3

      it caches using the name and value of each named argument: github.com/python/cpython/blob/8c49d057bf8618208d4ed67c9caecbfa71f7a2d0/Lib/functools.py#L462-L470

  • @kramstyles
    @kramstyles 4 місяці тому +1

    How on earth does someone know so much? How can I attain this level of expertise?

  • @siddsp02
    @siddsp02 2 роки тому +4

    Why not use weakref in this case, and construct a caching function using weakdicts?

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      you certainly could -- but many objects are not weak referenceable so the caching mechanism wouldn't be that useful

    • @siddsp02
      @siddsp02 2 роки тому

      @@anthonywritescode Fair enough. I think standard library specifically includes weak methods (though I haven't read into it), so maybe something could be possible. This was an interesting video!

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      you would have to weakly reference all the called _parameters_ -- not the method itself (after all, the tuple of parameters is what's used to make a cache key)

  • @sopidf
    @sopidf 2 роки тому +1

    Great video, thank you! Why do you show your keyboard and hands?

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      I also stream on twitch and it's fun -- I used to toggle the scene when I'd record for youtube but I'd always forget so I just keep it now

  • @unvergebeneid
    @unvergebeneid 2 роки тому +1

    Oof, what a gotcha! I never would've guessed this behavior!

  • @Tyokok
    @Tyokok 2 роки тому

    Thanks a lot for great video!

  • @OrCarmi
    @OrCarmi 2 роки тому +2

    Great video! This is a pretty big gotcha, I'd expect a warning about this in python docs

    • @petertillemans2231
      @petertillemans2231 Рік тому

      There is : this is from the docs :
      > In general, the LRU cache should only be used when you want to reuse previously computed values. Accordingly, it
      > doesn’t make sense to cache functions with side-effects, functions that need to create distinct mutable objects on each
      > call, or impure functions such as time() or random().

  • @sadhlife
    @sadhlife 2 роки тому

    Instead of __new__ I'd probably use a class decorator with another lru cache:
    from functools import cache
    def classcache(cls):
    @cache
    def wrapper(*a):
    return cls(*a)
    return wrapper
    @classcache
    class C:
    def __init__(self, x):
    self.x = x
    print("made new")
    print(C(1))
    print(C(1))

  • @ZephyrNX9
    @ZephyrNX9 2 роки тому +1

    So these classes would get garbage collected at the end of the program? Or would it memory leak even after Python exits?

    • @anthonywritescode
      @anthonywritescode  2 роки тому +1

      you can't really leak memory outside of your program -- when the program ends the memory space is torn down

  • @arnoldwolfstein
    @arnoldwolfstein 2 роки тому +1

    Thanks for the video Anthony. I'm just wondering; whether you're using Ubuntu in VM or on a host (main or dual boot)

    • @arnoldwolfstein
      @arnoldwolfstein 2 роки тому

      Probably in a VM, I just saw your VM video.

    • @anthonywritescode
      @anthonywritescode  2 роки тому +1

      on this machine (and most of the things I actively develop on) I'm in a VM -- though I did have a dual booted macbook at my last job

    • @arnoldwolfstein
      @arnoldwolfstein 2 роки тому

      @@anthonywritescode Great; I'm on a similar situation; running in a VM or dualboot on a macos host. According to your experience -which I can totally count on :), which option will you prefer?

    • @arnoldwolfstein
      @arnoldwolfstein 2 роки тому

      I mean for most cases you are using VM as you said, but do you see any performance difference?

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      getting linux to run on a modern mac is a ton of work -- a VM is much much easier. as for performance, most of the difference is in io as that has to be virtualized -- but the cpu usage usually has direct hardware support and doesn't really suffer from being in a VM -- this is the steps I used last time I dual booted, but that was back in 2015: github.com/asottile/scratch/wiki/Ubuntu-on-MBP

  • @xan1716
    @xan1716 2 роки тому

    I was thinking an exception to this rule might be cached classmethods. Unlikely to blow up the memory since classes typically are created at parse time, right?

    • @xan1716
      @xan1716 2 роки тому

      and they'll probably not be garbage collected till the end of the program, anyways (unless a class is defined in a closure, or something)

    • @anthonywritescode
      @anthonywritescode  2 роки тому

      they are still descriptors so you're going to get the instance passed through them (if it's called on the instance) and that's what'll get cached

    • @xan1716
      @xan1716 2 роки тому

      ​@@anthonywritescode woah -- that had not occured to me! tricky, tricky stuff.. :)

  • @wexwexexort
    @wexwexexort Рік тому

    Fantastic!

  • @lord_toad
    @lord_toad 11 місяців тому

    No one ever uses indefinite caching.. It's like saying don't use loops "while 1" because they'll run forever.. duuuh

  • @abdelghafourfid8216
    @abdelghafourfid8216 Рік тому

    Why the underscore variable did not get re-assingned to None ?

  • @RoyAAD
    @RoyAAD 29 днів тому

    Was this fixed with @cache in 3.12? Cause it seems not to have a maxsize argument.

    • @anthonywritescode
      @anthonywritescode  29 днів тому +1

      cache is just a shorthand for maxsize=None

    • @RoyAAD
      @RoyAAD 29 днів тому

      Is this a problem also for functions?
      And did you do the sequel to solve the each instance cache? If yes can you put the link in the description please?

    • @anthonywritescode
      @anthonywritescode  29 днів тому

      I would hope the first part is pretty clear from the video explaining _why_ this is a problem (and why or why not that applies to plain functions). I didn't follow up with that but it basically involves calling lru_cache in `__init__`

    • @RoyAAD
      @RoyAAD 29 днів тому

      ​@@anthonywritescode Yes. Your videos are one of the best on python. I always learn something new.

  • @StephenBuergler
    @StephenBuergler Рік тому

    Does python have weak references? If it did would it help here?

    • @anthonywritescode
      @anthonywritescode  Рік тому +1

      only a small number of things are weak referencable (and with significant overhead). strings for example aren't

  • @average_random_ant985
    @average_random_ant985 2 роки тому +1

    You forgot to say about cached_property decorator. It solves all the provided issues. And it is the simplest solution.

    • @anthonywritescode
      @anthonywritescode  2 роки тому +4

      cached_property does not help because the function takes a parameter

  • @NoProblem76
    @NoProblem76 Рік тому

    oh no memory leak

  • @akshaymestry971
    @akshaymestry971 2 роки тому

    USEFUL GEM... 💠