Which Python @dataclass is best? Feat. Pydantic, NamedTuple, attrs...

Поділитися
Вставка
  • Опубліковано 15 чер 2024
  • Get rid of boilerplate in writing classes.
    Which dataclass alternative should you use though? In this video we test dataclasses, attrs, tuple, namedtuple, NamedTuple, dict, SimpleNamespace, and Pydantic BaseModel for speed, memory efficiency, and features.
    ― mCoding with James Murphy (mcoding.io)
    Source code: github.com/mCodingLLC/VideosS...
    Previous dataclasses video: • Python dataclasses wil...
    dataclasses: docs.python.org/3/library/dat...
    attrs: www.attrs.org/en/stable/examp...
    namedtuple: docs.python.org/3/library/col...
    NamedTuple: docs.python.org/3/library/typ...
    SimpleNamespace: docs.python.org/3/library/typ...
    Pydantic: pydantic-docs.helpmanual.io/u...
    SUPPORT ME ⭐
    ---------------------------------------------------
    Patreon: / mcoding
    Paypal: www.paypal.com/donate/?hosted...
    Other donations: mcoding.io/donate
    Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, John Martin, Casey G, Pieter G, Krisztian M, Sigmanificient
    BE ACTIVE IN MY COMMUNITY 😄
    ---------------------------------------------------
    Discord: / discord
    Github: github.com/mCodingLLC/
    Reddit: / mcoding
    Facebook: / james.mcoding
    CHAPTERS
    ---------------------------------------------------
    0:00 Intro
    1:04 dataclass
    1:24 attrs
    2:13 tuple, namedtuple, NamedTuple
    4:05 dict
    4:39 SimpleNamespace
    4:58 Pydantic
    6:39 Speed comparison
    8:41 Memory comparison
    9:15 Feature matrix and winners
  • Наука та технологія

КОМЕНТАРІ • 215

  • @LiamInviteMelonTeee
    @LiamInviteMelonTeee Рік тому +35

    I'm a simple engineering student and a modest python user but those dynamic histograms sent chills down my spine

    • @mCoding
      @mCoding  Рік тому +15

      Check out plotly and the source code in the description!

  • @aarondewindt
    @aarondewindt 2 роки тому +126

    The buildin dataclass also has default_factory for defining default mutable values .

    • @mCoding
      @mCoding  2 роки тому +74

      😬 oops, thanks for pointing this out! I should have been more careful when I made the feature matrix.

    • @PeterZaitcev
      @PeterZaitcev 11 місяців тому +1

      Furthermore, unlike slots support, this was on the release.

  • @mpilosov
    @mpilosov 2 роки тому +42

    This is a great breakdown. I’ve had to explain this so many times to team members, now I’ll refer people to this video!

  • @franchiniitalo
    @franchiniitalo 2 роки тому +21

    Hey James, I just wanted to sincerely congratulate you for both the quality content and humor in your videos, amazing work!!

    • @mCoding
      @mCoding  2 роки тому +7

      Thank you very much for your kind words and support!

  • @markasiala6355
    @markasiala6355 2 роки тому +36

    I actually have a large ongoing project where I used namedtuples early on, with typing stored in a second tuple, then refactored to NamedTuple using the built in typing (which simplified storing the typing separately), and finally to dataclasses after seeing your video on that. It fit my application perfectly as I needed the flexibility of being able to modify the dataclass. If only I had known about dataclasses to start with. :) attr class also sounds interesting for my needs, I need to check that out.

    • @subjekt5577
      @subjekt5577 Рік тому +1

      I wish he covered classes extending from named tuple, one of my favorite pre attr methods....

  • @Jakub1989YTb
    @Jakub1989YTb 2 роки тому +12

    2:06 got me .. "real life". Those air quotes are heavy.

  • @jemand771
    @jemand771 2 роки тому +9

    I really enjoyed the text comment/annotation overlays in this video. they both add useful background info and give the video a more relaxed vibe without distracting from the main points! :D

  • @laurinneff4304
    @laurinneff4304 2 роки тому +98

    So when _are_ you going to explain slots? I have no idea what those are

    • @mCoding
      @mCoding  2 роки тому +61

      Gulp, I feel the pressure.

    • @AzureCz
      @AzureCz 2 роки тому +8

      @@mCoding yeah, I don't know what you're talking about either D:

    • @cameronball3998
      @cameronball3998 2 роки тому

      That was the Google search I made right after this video 😂 I am intrigued

    • @Elijah_Lopez
      @Elijah_Lopez 2 роки тому +7

      Classes usually use a dictionary to store variable. If you define a ___slots___ = 'var1', 'var2', you're class can only set those attributes to value mentioned in slots.

    • @AzureCz
      @AzureCz 2 роки тому +1

      @@Elijah_Lopez sir you're a legend

  • @kosmonautofficial296
    @kosmonautofficial296 Рік тому +2

    Great thanks so much for this video! I am starting to study pydantic and I haven't been made aware of these differences. This is a huge help and I wish more people would explain these important differences when telling others they should use this or that.

  • @sphereron
    @sphereron Рік тому +1

    I've often struggled with ways people define hyperparameters and inputs to neural networks in open source code. This video definitely helped me in my choice going forward.

  • @Aang139
    @Aang139 Рік тому +10

    Also would have loved thoughts on TypedDict which mirrors NamedTuple for dictionaries, giving type hinting and string key checking

  • @tamles937
    @tamles937 2 роки тому

    Great video! As always, the topic is well explained and I learnt something new
    The on-screen comments are really fun, I hope you'll put more of this in the future videos!

  • @hackergr325
    @hackergr325 2 роки тому +3

    At first you got my interest, after the "Presenting with meaningless example" you got my attention. Awesome video once again!

  • @Scranny
    @Scranny 2 роки тому

    I have used almost all of these, so I can say this is a fantastic summary of the various options.

  • @r2_rho
    @r2_rho 2 роки тому +1

    this is really the best Python channel on UA-cam. I've learned more on this channel than all others combined

    • @mCoding
      @mCoding  2 роки тому

      Wow thank you!

  • @PanduPoluan
    @PanduPoluan Рік тому +5

    Basically, one very strong rule of thumb is: If you need immutability and you can validate the data on your own, NamedTuple will _always_ be the best, hands down.

  • @fartzy
    @fartzy 2 роки тому +1

    Wow this is amazing man thanks for putting this together

  • @doc0core
    @doc0core 2 роки тому +2

    This is serious pro stuff. I started using dataclass thanks to your vid, then YT pushed another vid for pydantic and I was like bleh. Luckily this vid set me striaght. Now I understand each's use case. THANKS

    • @mCoding
      @mCoding  2 роки тому

      Glad it helped!

  • @FranciscoCorreaDias
    @FranciscoCorreaDias 2 роки тому +1

    1:24 "Will I ever explain slots?" One week later...
    Thank you so much for your explanations, James!

  • @jochengietzen
    @jochengietzen 2 роки тому +13

    Please keep the onscreen comments coming! Adds the perfect amount of fun to an informative topic "cries in mypy" 😁
    Great video, thanks 😊

  • @mystisification
    @mystisification 2 роки тому

    Very informative video, thanks James!

  • @zacky7862
    @zacky7862 2 роки тому +6

    Yeah pyndatic is so great for parsing/Serializing Json data.
    I've been using it. But for simple data, I use built in dataclass

  • @MrBoubource
    @MrBoubource 2 роки тому

    Wonderful last seconds, but wonderful video too!

  • @Yotanido
    @Yotanido 2 роки тому +5

    I've used most of these, it turns out.
    Started with the class that repeats everything. I then used the dict to try and make things slightly more convenient, but that was only feasible in very limited circumstances.
    Then I discovered named tuples, but... they are tuples. Wasn't a huge fan.
    Then, finally, I came across attr. That was a huge revelation and I absolutely loved it. Finally something decent.
    And then dataclasses were introduced to the standard library and I basically switched to using those. attr can do more, sure - but the dataclasses are easier to use and don't need the dependency. Unless I actually need the power of attr, I'll just use these.

    • @PanduPoluan
      @PanduPoluan Рік тому +1

      Depends on the data, tuples can be very suitable.
      For instance, I have to consume a YAML file containing a HUGE sequence of geo-coordinates (lat/long). For these kind of data, the kind that you read, keep in memory, and must not change, tuples are perfectly suitable, uses less memory, and very fast.
      And NamedTuple, just like other classes, can have methods defined within. So for instance I can write a distance_to() method which will calculate the big circle distance between one geo-coordinate with another geo-coordinate.
      If you need mutability, though, of course tuple just won't cut it.

  • @mikegazes
    @mikegazes Рік тому

    Thanks! This is exactly what I needed.

  • @aaronm6675
    @aaronm6675 2 роки тому

    Already know this is gonna be helpful and instructive!

  • @Azzonith
    @Azzonith 2 роки тому +2

    Great stuff!
    That would be even better if you'll make a follow up video about serialization of those objects and libs that can help.
    Often it's required to send tuple/datacalss/etc data over Kafka, to a DB or save as json and etc.
    Include 'marshmallow' lib in the vid as well!

  • @danielrhouck
    @danielrhouck 2 роки тому +2

    Iʼm starting a new Python project and Iʼm using `attrs` because of this video. Otherwise I would have used `namedtuple`, because I think without your videos I somehow would have missed even `dataclas`.

  • @PythonisLove
    @PythonisLove 2 роки тому +1

    your videos are always the best

  • @hansdietrich1496
    @hansdietrich1496 Рік тому

    Good comparison, thanks!

  • @tamerelsayed6368
    @tamerelsayed6368 2 роки тому

    thank you for the thorough explanation

  • @user-kc6wz7xr8e
    @user-kc6wz7xr8e 2 роки тому

    thank you man! You helped me a lot!

  • @nrbnlullu9327
    @nrbnlullu9327 2 роки тому

    Great video, Thanks a lot!

  • @behnamsalehi9765
    @behnamsalehi9765 2 роки тому

    Thank you. This information is really useful

  • @eniocc
    @eniocc 2 роки тому

    Perfect video. Congrats

  • @oxey_
    @oxey_ 2 роки тому

    I feel like I should go to casinos more often because I have no idea what slots are :)
    Great video! Typehint gang

  • @falxie_
    @falxie_ 2 роки тому +1

    Really glad to see slots supported in dataclasses now. When you have a lot of instances of one class slots can save a ton of memory

    • @mishikookropiridze
      @mishikookropiridze 2 роки тому

      This was added in 3.10?

    • @falxie_
      @falxie_ 2 роки тому

      @@mishikookropiridze That's more of a statement than a question isn't it

    • @mishikookropiridze
      @mishikookropiridze 2 роки тому

      ​@@falxie_ It is statement and hence you can assign boolean value.

  • @Mutual_Information
    @Mutual_Information 2 роки тому

    I do not use data classes nearly enough. This is good motivation to change that.

  • @GRAYgauss
    @GRAYgauss 2 роки тому +17

    Type hint gang. I came from a C background, so ducktyping felt like a Godsend. Then I got into rust and realized how much time I was spending debugging code because it was ducktypable. (lets not forget rust's awesome toolchain compared to python's...well yeah.) Hell, I was using i_var, etc just because it made it easier to reason about and not have to backtrack, which is when I first started wondering about it...Didn't fully click until I made the switch though.

  • @maimee1
    @maimee1 Рік тому +5

    4:36 There's TypedDict to consider too tho. (As in the type safety thing. You could type both dict and tuple and use a static type checker. If you use PyCharm and single quotes, accessing data by key is also not typo prone.)

    • @PanduPoluan
      @PanduPoluan Рік тому

      Why single quotes? There's no difference between single quotes and double quotes.

    • @maimee1
      @maimee1 Рік тому +1

      @@PanduPoluan Idk, ask PyCharm (and also VS Code I just found out) out.
      Too clarify: not typo prone => there's IntelliSense / auto completion.

    • @PanduPoluan
      @PanduPoluan Рік тому

      @@maimee1 well personally I don't find any difference between using single quotes and double quotes. But then again I always use double quotes because Black enforces that.

  • @MrLiuHai
    @MrLiuHai 2 роки тому +2

    Thx for the explanation! But it seems at this point Python is contradicting its own Zen: "There should be one-- and preferably only one --obvious way to do it."
    IMHO one should always prefer immutability. The diff between creating a new instance and setter could be ignored. If the performance is that critical, maybe one shouldn't choose Python at the first place.

  • @adirmazhir9159
    @adirmazhir9159 2 роки тому +11

    its also possible to use namedtuple like this:
    T = namedtuple('T', 'n f s')

  • @TechSY730
    @TechSY730 Рік тому

    In current versions of attrs, it only requires assigning fields to an `attrs.ib` if you need anything per field option beyond a default.
    Otherwise you can use regular variable declarations like dataclasses does.
    (You might need to use the "next-gen" API, I can't remember at the moment)

  • @red13emerald
    @red13emerald 2 роки тому

    Awesome comparison! What did you create the interactive graph at the end with? Looks like a nicer version of matplotlib.

    • @aflous
      @aflous 11 місяців тому

      Plotly

  • @SvetlinTotev
    @SvetlinTotev 2 роки тому +16

    A few arguments for dict gang:
    Everybody knows how it works and what the syntax is. Many libraries use it as inputs or outputs. If used as an interface it is easy to change it without breaking things. It is trivial to load and store them in json or send them over the network. I generally don't like speed comparisons of python code since that should never be the bottleneck of your program (you are using the wrong language if it is) but it is nice to know that dicts are fast af. I also haven't had any problems with reliability. But I guess that's partially due to my vscode extensions checking what I'm typing and giving suggestions.
    But I have to agree with you that the syntax is quite ugly compared to accessing elements with a dot. Though I don't think it would be too bad if the language added similar syntax for that (basically any kind of shorthand for ["string"]. But I guess . would be ambiguous. and most other characters already have a meaning. so maybe a double dot? some_dict..some_element)

    • @Alex-uh6qh
      @Alex-uh6qh 2 роки тому +5

      The problem of dicts is that you cannot check types in compile time. You can store different data types in one field due all runtime. By the way, static analyzers cannot predict the type of element of dict, so other IDEs (like PyCharm) cannot help you with suggestions, especially with available methods for each field. In your IDE, you use an AI-based extensions that predicts data types, but it is not a static analyzer

    • @masheroz
      @masheroz 2 роки тому +1

      This is timely. I've got a program in writing now, and am using dictionaries. I still think that formatting my data as nested dictionaries is the best representation of that data. Also, the original data format is actually defined as a dictionary.

    • @SvetlinTotev
      @SvetlinTotev 2 роки тому +2

      @@Alex-uh6qh This is true, but I think by picking python as your programming language you've already given up on being able to easily track the types of objects. With all the type hinting and other type-related stuff you are still quite far from the type information you have in languages like C++.

  • @pedrokalil4410
    @pedrokalil4410 11 місяців тому

    I am the owner of a backend project at my company and i use only pydantic, as we perform multiple api calls the validations are essential, and it integrates really well with fastapi

  • @jlp2011
    @jlp2011 10 місяців тому +2

    Pydantic 2.0’s just out, built around a Rust core. They claim up to 50x perf improvement so some of this might be changed. Still, kudos for covering v1’s overhead.

    • @mCoding
      @mCoding  10 місяців тому +1

      Great point! Maybe ill have to do an update video!

    • @arkadiuszszydeko7264
      @arkadiuszszydeko7264 10 місяців тому +1

      @@mCoding Looking forward to see how does it compare to what you presented here :)

  • @elnico5623
    @elnico5623 Рік тому

    I wish there was a channel like this for lua

  • @relsunkaev
    @relsunkaev 2 роки тому +1

    The apischema package is a good middle ground between Pydantic and dataclasses. It allows you to do the same runtime validation on dataclasses if you need to and has the same features as well as a GraphQL schema generator. It also performs validation faster than Pydantic.

    • @mCoding
      @mCoding  2 роки тому

      Never used that one, thanks foe llr sharing!

  • @ripp_
    @ripp_ Рік тому

    I think in the past, because I've been lazy, I've used tuples but, because I don't hate myself, I had constants for which index was which. I don't recommend this but that would give you the speed power of tuples with some of the naming power of namedtuple

  • @yky49
    @yky49 2 роки тому

    It is possible to use @dataclass(init=False) and custom __init__() for a parsing purpose. With slots for sure ;)

  • @soberhippie
    @soberhippie 2 роки тому +5

    Creating a new tuple still looks just as fast as modifying a value in a dict, interesting

    • @mCoding
      @mCoding  2 роки тому +6

      Yeah that was the biggest surprise for me, but I guess it kinda makes sense since a tuple can be implemented as a thin wrapper around raw memory, but a dict has to do hashing and such.

    • @NateROCKS112
      @NateROCKS112 2 роки тому

      However, you'll likely end up needing to get the tuple's values in order to instantiate a new one. So performing a function similar to dict setattr would be at a significant cost.

  • @luisraguzzoni5409
    @luisraguzzoni5409 2 роки тому

    Your videos are so good that I believe you could create a good intermediate-advanced python course. Just saying

  • @etienneboutet7193
    @etienneboutet7193 2 роки тому

    Great video ! But I feel like the onscreen comments were a bit distracting

  • @12nites
    @12nites 2 роки тому

    man, you really hammered down on this issue. No need to watch anything else.

  • @florianfuchs325
    @florianfuchs325 2 роки тому

    Hi
    Excellent Video! I was wondering what would be the right choice if I wanted to use the created class in a jit compiled numba function? As far as I have seen, namedtuples seem to be most suitable?

    • @PanduPoluan
      @PanduPoluan Рік тому +1

      I think you need a class that is serializable. namedtuple and NamedTuple are serializable by default.

  • @LerikPav
    @LerikPav 2 роки тому +1

    There's also TypedDict (since 3.8) with typesafety

    • @mCoding
      @mCoding  2 роки тому +6

      TypedDict is actually just a dict at runtime, it's value is only for static typing.

  • @Timmie_Tudor
    @Timmie_Tudor Рік тому

    Hello, if you didn't know, I decided to use the dataclass Python decorator as my handle

    • @mCoding
      @mCoding  Рік тому

      Haha you are gonna get a lot of accidental mentions with a handle like that!

  • @jmcantrell
    @jmcantrell 2 роки тому

    What are you using for the visualizations at the end?

  • @mrtnsnp
    @mrtnsnp 2 роки тому +2

    OK, on type hints. I probably need a kick in the you-know-where, but I can't get it to play nicely with a few packages and features I need. I frequently use numpy, and a lot of funtions don't really care about receiving a single number or a full array of numbers. They may not even care if the number is a float or an int, but let's focus on floats here. The return value typically has the same shape as the main input, but may be a single number.
    How do I set up type hinting for numpy arrays? How do I set type hinting up for polymorphism?

    • @wsrgs4
      @wsrgs4 2 роки тому

      I haven't looked into it extensively, but I'm aware there is a numpy.typing module which includes an ArrayLike type for anything that can be converted into an array, including scalars. you might want to look into the module documentation.
      specifying the dimensions of an array in python's type hinting system is generally difficult however, so I'm not sure there's a way to incorporate that information in your annotations.

    • @PanduPoluan
      @PanduPoluan Рік тому

      Use TypeVar.
      For instance, here's a made up function:
      T = TypeVar("T")
      def makelist(n: int, item: T) -> list[T]:
      return [item for _ in range(n)]

  • @rdean150
    @rdean150 2 роки тому +1

    I started using pydantic bc it allows specifying a conversion function to try to cast input values to the desired type. I didn't realize how much of a performance hit that library incurs, or that attrs can do this also but much more cheaply. I guess I should switch to attrs.

    • @mCoding
      @mCoding  2 роки тому +1

      I didn't specifically compare times for when you are doing conversions. Make sure to time your use case yourself since Pydantic may still be faster if you are doing conversions.

    • @rdean150
      @rdean150 2 роки тому

      @@mCoding Ah, thanks for the heads up. That probably accounts for a decent chunk of the time difference, as I think pydantic is always going to try to do basic type casting on all values when instantiating new instances, which surely comes with some overhead, particularly when you supply a custom function for it.

  • @khoda81
    @khoda81 2 роки тому +2

    How did u measure memory usage?

  • @t2udu
    @t2udu 2 роки тому

    Really liked the visualization. Is that plotly?

    • @mCoding
      @mCoding  2 роки тому +1

      Yep! See the code to produce it on GitHub!

    • @t2udu
      @t2udu 2 роки тому

      @@mCoding will check that out

  • @ananzero8751
    @ananzero8751 2 роки тому +6

    What library was used to generate the graph? It looks nice.

    • @the_crypter
      @the_crypter 2 роки тому +7

      Plotly, It's easily the most interactive Visualization Library. It's as simple as matplotlib.

    • @mCoding
      @mCoding  2 роки тому +4

      Yep, plotly express specifically. Check out the code on github! Link in desc.

    • @saadisave
      @saadisave 2 роки тому +1

      @@the_crypter That's a bad measure of simplicity

  • @MrShoorf
    @MrShoorf 2 роки тому +1

    "As well as BaseModel, pydantic provides a dataclass decorator which creates (almost) vanilla python dataclasses with input data parsing and validation."

  • @jakubjakubec9693
    @jakubjakubec9693 2 роки тому

    I have my own class decorator that returns dataclass(cls), but I get no type hints this way. Is there a way to fix it ?

  • @vekyll
    @vekyll 2 роки тому

    I'm a bit confused... do you have any idea why SimpleNamespace's get is so horribly slow? I mean, it's a hash lookup anyway.

  • @Destrolll
    @Destrolll 2 роки тому

    Please care to explain why shouldn't I assign attributes to an instance of an empty class? 4:50

  • @vxsery
    @vxsery 2 роки тому

    🎉🎉🎉🎉

  • @VegetableJuiceFTW
    @VegetableJuiceFTW Рік тому

    would have been cool to compare pydantic with the validation turned off for fairness sake :D

  • @grzegorzryznar5101
    @grzegorzryznar5101 Рік тому

    @mCoding How do measure speed execution in a repative way? I was trying to measure performance, but for the same setup I had got scores differing a lot (more than few percentages). Code was purely in Pyhon, no external sources, no io, but still differences were very noticeable.

    • @mCoding
      @mCoding  Рік тому

      For this video I believe I used timeit since they are tiny snippets, and the timing code is available in the github repository in the description. Timing measurements may vary drastically depending on things such as on your your cpu and version of Python, which is why it is always best to verify the timings for your own setup!

    • @PanduPoluan
      @PanduPoluan Рік тому

      @@mCoding Also with Intel's franken-CPU having "P" cores and "E" cores, it will be a gamble.

  • @deekshantwadhwa
    @deekshantwadhwa Рік тому

    Which software/package/language are you using for the graphs UI in the end?

    • @mCoding
      @mCoding  Рік тому

      Plotly! See the source code in the description if you would like to see the exact code i use to generate the plots.

  • @ManuelBTC21
    @ManuelBTC21 2 роки тому +1

    If you care about correctness, I would argue for NamedTuple. The fact that it's immutable is a feature, not a bug.

    • @mCoding
      @mCoding  2 роки тому +5

      Immutabulity is definitely a feature, but mutability is also a feature. As always you should choose based on what is most appropriate for your problem.

  • @chriskeo392
    @chriskeo392 2 роки тому

    What is the use case for slots?

  • @viktornerlander1409
    @viktornerlander1409 2 роки тому

    if i have a very large set of data, with different types of data like multiple timeseries, single character/digit variables etc, should i use dataclasses to store them? and if so how? do i pickle classes? right now i'm using pandas for everything. thanks for the video

    • @zachwhite2716
      @zachwhite2716 3 місяці тому

      I may be in the extreme minority here, but IMO dataclasses are not a good fit in most situations, but particularly here where you have large sets of nested data. Just stick with dict or pandas.

  • @ilyam.1872
    @ilyam.1872 2 роки тому +2

    Yeah that's cool and whatnot, but have you ever tried this? class D(dict): __getattr__=dict.__getitem__; __setattr__=dict.__setitem__; __delattr__=dict.__delitem__

    • @mCoding
      @mCoding  2 роки тому +2

      Lol no i never considered that :)

    • @ilyam.1872
      @ilyam.1872 2 роки тому +1

      @@mCoding absolutely should, it's so easy and error-prone, practically a cheeseburger of python.

  • @joshbennett5908
    @joshbennett5908 2 роки тому +1

    What tool are you using for your bar chart?

    • @mCoding
      @mCoding  2 роки тому +1

      Plotly express! It can export to html you can share in your browser without python even installed.

  • @lex_darlog_fun
    @lex_darlog_fun 2 роки тому +1

    @mCoding are you REALLY sure you've measured memory footprint correctly? What was your test methodology? The difference between NamedTuple/dataclass/class is supposed to be quite different from what you've shown (they do differ but not THAT much).
    According to this video (it's in russian, but code is clearly visible): youtube /tsEG0WM3m_M?t=60 :
    1. The author uses pympler.asizeof() function instead of built-in ones since it's the only right way to measure *FULL* memory consumption of a given object. I personally re-tested it (generated a HUGE collections, taking literally gigabytes if RAM) - and yes, the built-in ones were returning some rediculous results, not even close to the actual RAM taken by python interpreter.
    2. According to his tests, the difference is actually like this (on 1k instances):
    2:05 - dict = ~ 1.2MB
    3:44 - dataclass = ~ 1Mb
    5:04 - namedtuple = ~ 720 Kb
    5:54 - typed NamedTuple = also ~ 720 Kb

    • @mCoding
      @mCoding  2 роки тому

      It's hard to say whether the way I counted things is the "correct" way because it depends on what you wanted to count, but the numbers are approximately the same with pympler vs the getsize method I used. The order of which classes use the most memory is exactly the same with either method. The main difference between what pympler does vs what I did not do is that pympler tried to account for object alignment. pympler assumes that all Python objects are 8-byte aligned and no packing is done (hence why the pympler answers are all multiples of 8), counting padding bytes in the total size count. On the opposite end my getsize assumes all objects are optimally packed together, not including padding bytes in the total size. The truth is probably somewhere in the middle and also an implementation detail that could change at any moment. But, in any case, I wouldn't call either method the "correct" one, they are both good estimates and their difference is pretty small.
      Also note that depending on the way you do your tests the data can make a big difference in how much space is actually used. For example (1,1) uses less memory than (1,2) because the 1 objects in the first tuple are the same.
      pympler
      0: dataclass (slots) - 168 bytes
      1: plain class (slots) - 168 bytes
      2: tuple - 176 bytes
      3: NamedTuple - 176 bytes
      4: namedtuple - 176 bytes
      5: attr class (slots) - 176 bytes
      6: dataclass - 432 bytes
      7: plain class - 432 bytes
      8: attr class - 432 bytes
      9: dict - 512 bytes
      10: SimpleNamespace - 552 bytes
      11: pydantic - 560 bytes
      method i used in video
      0: dataclass (slots) - 162 bytes
      1: plain class (slots) - 162 bytes
      2: tuple - 170 bytes
      3: NamedTuple - 170 bytes
      4: namedtuple - 170 bytes
      5: attr class (slots) - 186 bytes
      6: dataclass - 408 bytes
      7: plain class - 408 bytes
      8: attr class - 408 bytes
      9: dict - 488 bytes
      10: SimpleNamespace - 528 bytes
      11: pydantic - 536 bytes

    • @lex_darlog_fun
      @lex_darlog_fun 2 роки тому

      @@mCoding thanks for such a detailed responce.
      > For example (1,1) uses less memory than (1,2)
      Obviously, when you do performance tests, you need to intentionally break those under-the-hood optimisations. Back then, when I was checking myself examples from the forementioned video, I used the simplest values for items I could think of. iirc, each class (simple class, dataclass, dict, set, list, tuple and various types of named tuples) had just 3 values:
      1. an int, unique for each item (and I know that int is internally optimised up to 256 or smth, but that's neglegable relative to the total number of items I had for test - iirc, it was about millions, tens of millions or smth of that matter).
      2. the same int, converted to a string, padded with random ASCII characters to make all the strings of equal length (used random characters instead of zeroes - just to be sure).
      3. a float in [0, 100.0] range - also unique for each item.
      And to be the most precise, as I said, I kept increasing the number of items until the total collection size reached above 1 Gb. Each measure attmpt was done in a separate python session. And that's the thing I'm intrested the most when I asked about your methodology. With your method - did you just create a single instance and measured it or you generated a big enough number of them, measured the total consumption and divided it by the number of items? I mean, a single item difference might be 168 bytes vs 162. But if you have a tuple with a million of dataclass instances vs the same tuple type storing the same million of items with the same underlying data, but items themselves are NamedTuples now, my results were very different from what you've shown. At the end of a day, it doesn't matter that each individual instance is reported about the same. What matters is when you have a ton of them, and the only varying factor is type of an item, you should count the total difference as overhead. You won't use just a single instance of that dataclass/namedtuple in your program. So I don't know the theory behind it, but in practice my own tests gave the same results that russian guy tells in the video. And dataclass vs NamedTuple were nowhere near 162 vs 170 numbers you provide.
      Speaking of which, I have no idea how it's even possible for dataclass to take less memory than a named tuple or even a simplest tuple.
      So, could you disclose your methodology?
      To be clear: I'm not attacking, I really want to know the actual difference in various types of data containers. I'm just concerned that the numbers you provide conflict with basically everything I ever heard on the subject and with my own synthetic tests.

  • @Moody0101
    @Moody0101 2 роки тому

    Well, firstly, your hair was so great tho

  • @BosonCollider
    @BosonCollider Рік тому

    I like msgspec

  • @alansnyder8448
    @alansnyder8448 8 місяців тому

    @mCoding. Could you redo this video with Pydantic 2.0? I get what you are saying about @dataclass being used in internal applications but sometimes you don't know for sure if it won't eventually be serialized into JSON, so pydantic is something I choose if I'm not sure. I want to know if the new 2.0 with Rust implementation has gotten the speed into the same ballpark as the other options.

    • @mCoding
      @mCoding  8 місяців тому +1

      Hmm, perhaps. While a rust implementation under the hood may improve performance, I suspect that it will not change the qualitative picture very much. Pydantic is slower primarily because it is fundamentally doing more work, namely validation and conversion, whereas the other options do neither validation nor conversion.

    • @alansnyder8448
      @alansnyder8448 8 місяців тому

      @@mCoding Maybe a good video might be how to use Dataclass and Pydantic together.
      I think in my case half of my projects are with FastAPI which I love and it depends on Pydantic. I've seen too many videos that compare Pydantic with Dataclasses (yours included) and have come to think of them in the same category. Since I'm already working with Pydantic in half my projects I've just gotten very comfortable with them.
      Knowing the performance hit puts a slightly different spin on the situation so maybe Dataclasses should be used for all internal-only data that won't be parsed. So then maybe just wrap a Dataclass in a field of a Pydantic class when you need to parse it.
      I'll keep this in mind myself in the future.
      Pydantic + Dataclasses would be an interesting video for me if you solicit ideas.

  • @scottbrewer474
    @scottbrewer474 2 роки тому +2

    And here I was thinking I was fancy by bundling data into a dictionary vs lots of variables! (Stupid Dunning-Kruger effect)

    • @zachwhite2716
      @zachwhite2716 3 місяці тому

      Give yourself enough time and you’ll come back to the wisdom of simply using dictionaries instead of complex nested objects.

  • @0730pleomax
    @0730pleomax 2 роки тому

    Pydantic, attr, dataclasses, NamedTuple

  • @Rebeljah
    @Rebeljah 2 роки тому +1

    Yeehaw baby type barren code ftw!!1

  • @trag1czny
    @trag1czny 2 роки тому +2

    discord gang 🤙

  • @korbiniankoch
    @korbiniankoch 2 роки тому

    Which tool are you using to create the interactive bar charts?

    • @mCoding
      @mCoding  2 роки тому +1

      Plotly express

  • @rikschaaf
    @rikschaaf 2 роки тому

    Can't you throw your python code through some optimizer to convert everything to a tuple wherever possible? Your source code would still be your own readable code, but the optimized code that comes from that will be more optimized for speed and memory usage. Best of both worlds!

  • @dylan-dylan-dylan
    @dylan-dylan-dylan 8 місяців тому

    Accessing a dictionary's values by key is its primary purpose...it's only error-prone if you are ignorant to the pass-by rules of the value's type.
    #teamdict

  • @sevdalink6676
    @sevdalink6676 Рік тому

    For me Pydantic is great for prototyping and the losses are acceptable for the sake of being always in detail informed about data errors. It even enables you to to skip writing early tests because of that.
    Still the charts are extremely useful to show that Pydantic can be an important target in optimization.

    • @mCoding
      @mCoding  Рік тому +1

      An excellent point. This is Python after all, raw speed is not usually what we optimize for and paying some extra runtime cost for data validation when it "shouldn't" be needed may be worth it depending on the situation.

    • @heroe1486
      @heroe1486 10 місяців тому

      Is it tho ? 5 microseconds for creation, 9 and 400 ns for getting and setting, and it was before pydantic v2 enhanced by rust.
      Unless doing several thousands of those are we really concerned about those numbers in python ? Especially when writing an API where the network latency and DB queries could easily reach the 100ms mark in good conditions.

    • @sevdalink6676
      @sevdalink6676 10 місяців тому

      @@heroe1486 I agree that it would be great to see this video with Pydantic V2 performance included. They made amazing progress.
      I agree with the rest you said as well. You asked and answered you question. Like I said, it can be an important part, not everywhere, but it's good to have it on your checklist.

  • @hieu8276
    @hieu8276 2 роки тому +1

    Still prefer dataclass since there is no need to install additional packages :)

  • @user-iv3tb8pp3x
    @user-iv3tb8pp3x Рік тому

    Also dataclasses can be "frozen" so they are not modified, which to me is better than pydantic's BaseModel

  • @guzziiw
    @guzziiw Рік тому

    Do you mind explaining why using dict is error-prone? Doesn't seem trivial to me.

    • @PanduPoluan
      @PanduPoluan Рік тому

      Unless you define a TypedDict, you might accidentally mistyped a key resulting in a KeyError.

    • @zachwhite2716
      @zachwhite2716 3 місяці тому

      Personally I find that the “potential typo” issue is overstated. I have 20 years of python experience and it’s never been a serious source of errors. Code that isn’t easily understood, like when you use a mess of nested classes instead of a simple data structure with a dictionary at its root, however, has caused me a ton of problems and really hard to debug situations.

  • @xBZZZZyt
    @xBZZZZyt 2 роки тому

    What about list?

  • @liesdamnlies3372
    @liesdamnlies3372 2 роки тому

    ALL the dataclasses

    • @mCoding
      @mCoding  2 роки тому +1

      I'm sure to get comments about others I forgot :)

  • @chaseduckett135
    @chaseduckett135 2 роки тому

    Are you using R ggplot for the plot?

    • @mCoding
      @mCoding  2 роки тому

      im using plotly!

  • @Michallote
    @Michallote Рік тому

    Okay at 5:55, One of my recent headaches is reading a god-damn xml. I hate the guts out of it. I have to parse everything as it is always in string format. xml.etree is great but I still have to manually input every string. and rename classes

  • @irmdev595
    @irmdev595 Рік тому

    dataclasses video with slots and inheritance(super_init)

  • @cicik57
    @cicik57 2 роки тому

    okay, so first, dataclass has no type checking , with attrs you must give validator with validator= , so the notation alone n:int is not working .
    This foreign library classes are horrible.
    How do i do it. It is no problem to write a class what is defined as in pydantic, read kwargs and set args with type checking on init and methods, including checking of collection items types, like List what i am almost sure these libraries are not making, but have nice ability with one- command to turn it off as debug is done.

    • @mCoding
      @mCoding  2 роки тому

      Hi, it seems like you are new to Python. The notation x: int is not supposed to be something checked at runtime, these hints are completely ignored at runtime as this would be a huge (think 10x) performance penalty, which is shown in the graphs in the video. Most type errors can be found by static analyzers, which is who the x: int is for. The only case when you need to do runtime checking is when you don't know the types ahead of time. The most common situation this happens is parsing since you don't know what data you are going to read in next, and this is why pydantic purposefully pays the cost of runtime type checking.

    • @cicik57
      @cicik57 2 роки тому

      @@mCoding hey, i am not new. It is function field types what are ignoted, here is declaration of static class field (n) equals to a class (int): n = int
      I thought, in THIS tools for example @dataclass the notation SHOULD typecheck, because, why do we write it dataclass construction like that? And i just checked to enter like float instead of int, and it works smoothly.
      So My solution would be, retaining the @dataclass syntax what i found kind of convinient, because it retains order and no need to specify all arguments as named, create default type- checkers and turn on them, and if you want custom checker, you can write there like a = lambda x: 0

  • @tamilvanan342
    @tamilvanan342 2 роки тому

    I see you import modules inside function. Any particular reason?

    • @mCoding
      @mCoding  2 роки тому

      This was just to make it easier to see which imports were needed for which examples.

  • @ADBraimah93
    @ADBraimah93 2 роки тому +1

    3:26 - Type hint gang!!