How Sets Can Truly OPTIMIZE Your Python Code

Поділитися
Вставка
  • Опубліковано 29 кві 2023
  • Today we will be looking at how sets can be used to significantly optimize your Python code.
    ▶ Become job-ready with Python:
    www.indently.io
    ▶ Follow me on Instagram:
    / indentlyreels

КОМЕНТАРІ • 86

  • @Indently
    @Indently  Рік тому +77

    As one of you were kind enough to point out, the speed_difference calculation was completely off. It should have been obvious to me immediately, but sometimes when you're writing tutorials these details pass you. In the example I provided, sets are MUCH faster than 99%. Sorry for that bad snippet of silly math :)

    • @Dent42
      @Dent42 Рік тому +31

      If it makes you feel any better, GitHub has been using the same incorrect calculation for their Copilot efficiency gains for months now, saying their product increases efficiency by 55% (presumably coming from the fact that tasks took 55% less time), when in reality it increases efficiency by 127%

    • @Indently
      @Indently  Рік тому +11

      That's a hilarious little piece of trivia, thanks for sharing 😂

    • @callbettersaul
      @callbettersaul Рік тому +2

      I've thought about it and I'm still quite certain, that the math is correct in the video. My reason for believing this is that the 100% represents the time the list took to complete. That means 100% faster would mean that set took 100% less time to complete (which would be 0s). However, if you compared it the other way around (list took x% longer to complete), then it would be quite a large number, for example, at 3:00 it would be correct to say, that the list was 2,848,073% slower, than the set or that the list took 2,848,073% more time to complete. But the other way around, it can't go over 100%, because it would go into negatives that way.

    • @callbettersaul
      @callbettersaul Рік тому +1

      @@Dent42 Isn't it true tho? When they're comparing to the old value and get the new value, that is 45% of the original value. That means the old value was reduced/improved by 55%. Their efficiency is based on the time it took and the time can't be 127% less than the original time, because nothing takes below 0s to complete.
      Edit: Nevermind, when talking about efficiency/performance increase, then it would be 122% I think (tried 3 different formulas, all gave 122%, not 127%).

    • @Indently
      @Indently  Рік тому +5

      I like the idea of something taking 100% less time to complete

  • @callbettersaul
    @callbettersaul Рік тому +43

    To add to this, when lists check if an element is there, it uses '==' operation on each member, which is why it takes so long. However, set takes the hashcode of the element (hash(element)) and checks if that hashcode is in the table. In your own classes you can modify the behaviour of the operations, by writing your own implementation of the magic methods __eq__() and __hash__().

  • @BrianStDenis-pj1tq
    @BrianStDenis-pj1tq Рік тому +12

    I'd suggest that for many applications, using a dict() is what you want. Not only do you get the speed of a set, you can store a value as well. Dict or list are the go-tos, while set can be used if you really need the speed of a dict and you really don't want to store anything.

  • @GabiruM
    @GabiruM Рік тому +1

    I love your videos, man. They are simple, effective and very useful. Hope you get the recognition you deserve.

  • @alidev425
    @alidev425 Рік тому +1

    I'm, as always, learning something new from your short videos. great content keep it up 👌👌

  • @vuvu7005
    @vuvu7005 Рік тому +3

    if you whant to handle more infos (like counts, index in an array, order, etc...) you can use dicts, the keys of the dict will behave practicaly the same as the set and you will be able to store aditional infos

  • @CatalinSoare
    @CatalinSoare Рік тому +2

    While I did know that sets were a lot faster than lists, I didn't know by how much. This was very interesting and informative.
    However, what I didn't know and just saw in your video was pep-515; being able to separate digits with underscores for visual aid seems awesome!

  • @pistolangpaltik
    @pistolangpaltik Рік тому +8

    I'm new to programming and I'm, as always, learning something new from your short videos. However, there is something wrong with the calculation on speed increase. As the result of 'set' approaches zero, the calculation approaches 100% and never beyond. I think you need to have the result of set as base. Thanks for the videos!

    • @Indently
      @Indently  Рік тому +8

      That's absolutely my fault for not double checking that bit, I used ChatGPT for inspiration and didn't notice that until you brought it up. I need to be more strict on the code it generates.
      Thanks for bringing it up!

    • @billvvoods
      @billvvoods Рік тому

      @@Indently Not sure if it was gpt3 or 4 that you used because 3 is notorious for math mistakes. In any case your video was very good. Straight forward and to the point.

    • @callbettersaul
      @callbettersaul Рік тому

      It approached zero, because the list was the base and going above 100% would mean, that the set took negative seconds to complete. If he used set as base, he would have to write that the list was slower by 2,848,073% for example. In video it stated that set was faster by

  • @the_infinity_snake
    @the_infinity_snake Місяць тому

    The thumbnail showed the word "why",
    so i thought this video would explain the reason why a set is faster to search for a specific value than a list.

  • @abdultheseekerofknowledge4453

    Wow bro, everytime i see one of your videos i immediately know i'll learn something new, thank you

  • @dcollett
    @dcollett Рік тому +4

    Thanks so much! Your videos are the best on the internet.

    • @WinfriedKastner
      @WinfriedKastner Рік тому

      And he speaks the best and understanding English of the whole world!!

    • @Indently
      @Indently  Рік тому

      That's too kind, Winfried :)

    • @WinfriedKastner
      @WinfriedKastner Рік тому

      @@Indently That's absolutely true what I said. I'm assuming you're not British born or something because of your first name. Ialso bought your Udemy course last weekend, even though I know Python well by now. But it is a pleasure to work through and follow. And thank you for your great videos and hopefully many more to come on UA-cam!!!

  • @NickDanger3
    @NickDanger3 Рік тому +1

    The fact that sets are implemented as hash tables, and that the performance is O(1) are not two separate “reasons”: they are the exact same reason expressed in two different ways.

  • @octonwoomy1150
    @octonwoomy1150 Рік тому +1

    W video as always, thanks for the great content

  • @sirynka
    @sirynka Рік тому

    My first thought was that you're going to talk about list vs tupple comparation and I was confused about how immutability could make iteration faster.

  • @LogicVertix
    @LogicVertix Рік тому

    Thanks For this valuable practical , I have only studied this in books only.
    Do we have any other way to find out the time of execution of a program so that I can find the efficient programs

    • @Indently
      @Indently  Рік тому

      Maybe this will help: ua-cam.com/video/qXLh5sZLpHE/v-deo.html&ab_channel=Indently

  • @apsifiable
    @apsifiable Рік тому +3

    You should search zero instead of 999_999. Or in other words, not that your conclusion isn't correct, but it is almost like some marketing department came up with your example: the most unoptimal case from the point of view of lists.

    • @Efebur
      @Efebur Рік тому +2

      That is the point though. You should compare worst cases (or at least average), not best cases. Obviously if he searched for 0, it would come up immediately for his list. Instead, he should shuffle the list and search for multiple random numbers and make an average.

    • @apsifiable
      @apsifiable Рік тому

      @@Efebur Indeed. What I learned is that sarchasm is a difficult sport.

    • @Indently
      @Indently  Рік тому

      The amount of friends I've lost through using sarcasm in text form 😂

  • @saiakhileshande8486
    @saiakhileshande8486 Рік тому +1

    Hey @Indently, what if I am given a very long list and asked to check if an element exist in the list or not. In this case, which one is optimal - checking in the list or converting the list to set and checking in the set? The question is that is it better to convert list into set and check?

    • @dmytrk
      @dmytrk Рік тому

      Converting a list to set takes N hashing calls, so you don't win in efficiency. Either store your data in a set, if order doesn't matter, or just use a list

  • @k.chriscaldwell4141
    @k.chriscaldwell4141 Рік тому

    Good info. Thanks.

  • @Mulakulu
    @Mulakulu Рік тому +5

    I believe you made a mistake with the percentage code and result. Your result doesn't show how many times faster the fast code was. That would be determined by 9.16/(3.2•10^-5)=286250, so a quarter of a million times faster. Basically the fast code could run a quarter of a million times in the same time the slow code could do it once.
    I recognize your way of calculating it to be the way to find percentage error, where 100% seems like a reasonable result, but it's not what you're looking for I presume.

    • @James_08_07
      @James_08_07 Рік тому +1

      Just about to say exactly this, the limit in his approach is 100% faster which is accurate but not the best way IMO to show how much faster something is.

    • @Indently
      @Indently  Рік тому

      The math is silly, I was fooled by the shiny ChatGPT-4 model. I need to be much more harsh on it.

  • @vvv3876
    @vvv3876 Рік тому

    Python 3.10.4
    Windows 8.1
    I dont have the library timeit and when try to install it through pip i get the error:
    No matching distribution found for timeit

  • @devocracy1089
    @devocracy1089 Рік тому +1

    Very cool!

  • @KeiraraVT
    @KeiraraVT Рік тому

    Looking at this makes me wonder if I can optimize my function by making a set instead of a dict for speed. Or if a dict is faster than a. List but slower than a set.

    • @appelnonsurtaxe
      @appelnonsurtaxe Рік тому

      From an implementation perspective, a set is often implemented as a simple hashmap (dict) where the keys are the set element and the values are all void (empty). So the performance of indexing into a dictionary and checking if an element is in a set is pretty much the same under the hood

  • @artistpw
    @artistpw 2 місяці тому

    Sets seem far more efficient than lists. Nice you can convert them back and forth to get items sorted.

  • @mysticplayz5649
    @mysticplayz5649 Рік тому

    Doesn't that mean if sets are unordered the number we are finding will be at 2nd index or in the middle where in list we know that what we are finding is always at the last index so set find it quicker than list ?😮

    • @Efebur
      @Efebur Рік тому +2

      No. If it were in the middle (on average) it would be half as slow as the list, instead it is really fast, always. It's because sets are using a completely different algorithm than lists (hashing).

  • @laz272727
    @laz272727 Рік тому +3

    One thing sets do as opposed to lists is eat twice the ram. It's not twice of a lot of ram, since lists tend to be storing pointers and those are int sized, but it's still more.

    • @Indently
      @Indently  Рік тому

      Definitely interesting :)

    • @infiniteplanes5775
      @infiniteplanes5775 Рік тому

      There are always trade offs, and I am interested in hearing some more stuff about lists vs sets

    • @laz272727
      @laz272727 Рік тому

      @Infinite Planes
      There are generally two types of lists used commonly.
      Lists you see by default in OOP languages are generally Array-backed - literally an array of a size similar to the list, but rounded to some pre-set number. They're more expensive to edit, since you need to make a new array every time, but can be accessed by index O(1) (it's a sum operation of array's beginning pointer and the index). They are generally the best if you need an array you would want to edit later.
      However, in Functional languages, the default is a Linked List - where every list member has a pointer to the next one. This means editing it is easy, since the next member can be anywhere in memory and you just have to change the pointer; however, finding a specific member is very difficult since you literally have to iterate over the entire piecemeal list. Also, since they store one extra pointer, they're slightly bigger. They are better for FP-like usages of lists, though.

    • @laz272727
      @laz272727 Рік тому

      @Infinite Planes There are two types of construction for tables (the video talks about sets, but values don't have to be unique and the difference in speed is very minor - in fact, many languages implement sets as tables without a value, or with the value being the actual value and key being hashed).
      Note that while calculating space taken by tables is relatively easy (they, like array lists, are slightly bigger than their elements put together), explaining them isn't quite as easy. They have a lot of optimizations related to the fact that they're meant for really fast lookups by keys. Hell, in a perfect world, hashmaps are O(1) (if you use an array indexed by the hash - obviously that is nowhere near space efficient).

    • @IlariVallivaara
      @IlariVallivaara Рік тому +1

      ​@@laz272727 Array-backed lists are not necessarily or even usually more expensive to edit: it all depends on how you edit it. Further, linked lists offer asymptotic removal/addition benefits only if you can obtain the corresponding pointer for free or very cheaply, which is not usually the case. It is a common misconception, though, because people don't realize the cost of traversing the linked list to obtain the pointer.
      You also seem to mix random (index-based) access to finding a specific number in the collection. These are completely different operations, and the latter is O(n) for both (unsorted) arrays and linked lists.

  • @chx75
    @chx75 Рік тому

    How about dicts?

  • @sushantpawar3703
    @sushantpawar3703 Рік тому +3

    What about tuples compared to lists?

    • @k.kaiserahmed8013
      @k.kaiserahmed8013 Рік тому +1

      Somewhat slow... Just a lil bit....

    • @MilChamp1
      @MilChamp1 Рік тому +1

      tuples are faster to create and index, and very similar to search through. Lists are useful for collections that need to be edited.

  • @Smbrine
    @Smbrine Рік тому +1

    Just use ctypes. Don't be lazy, c/cpp is about 100-200 times faster (it takes about 0.2sec to load a cpp function in python but even considering this it will be 10 times faster at least)

    • @Indently
      @Indently  Рік тому

      We use Python because we are lazy.

    • @Smbrine
      @Smbrine Рік тому

      @@Indently python is also lazy so that sums into a very poor performance :)

    • @Indently
      @Indently  Рік тому +2

      @@Smbrine Poor performance, big salary 😎

  • @tanmaypatel4152
    @tanmaypatel4152 Рік тому +3

    Which IDE are you using in this video?

    • @mailonbido
      @mailonbido Рік тому

      im curious too

    • @tanmaypatel4152
      @tanmaypatel4152 Рік тому +1

      @@mailonbido I got it. It's pycharm. It looks different because of the latest UI update in the 2023 version I guess.

    • @mailonbido
      @mailonbido Рік тому

      @@tanmaypatel4152 oh nice, thanks!

    • @tanmaypatel4152
      @tanmaypatel4152 Рік тому +1

      @@mailonbido You're welcome

  • @thegothaur
    @thegothaur 4 місяці тому

    funny. it is O(1) because of sets being hash tables :)

  • @computerfun6724
    @computerfun6724 Рік тому

    Love from India bro 🇮🇳

  • @ThankYouESM
    @ThankYouESM Рік тому

    Is there a way to generate pixel-by-pixel smooth, colorful images incredibly fast with set()?

    • @waylonbarrett3456
      @waylonbarrett3456 Рік тому +1

      You could have a set for every integer value of a color channel. So, one for 0 red, one for 1 red, one for 2 red, etc. And the same for green and blue channels. Then, in each set, store the pixel indices for that color value.

  • @link_safe
    @link_safe Рік тому

    is a frozenset() faster than a set()?

    • @laz272727
      @laz272727 Рік тому +1

      They take slightly less ram, since they don't need to be mutable.

    • @link_safe
      @link_safe Рік тому

      @@laz272727 Thanks 🧑‍💻

  • @kbcat
    @kbcat Місяць тому

    what ide is this

  • @rishiraj2548
    @rishiraj2548 Рік тому

    👍