Stop wasting memory in your Pandas DataFrame!

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 22

  • @BurkeHolland
    @BurkeHolland 2 роки тому +3

    The DataType! I had no idea this was even possible.

  • @nczioox1116
    @nczioox1116 2 роки тому +1

    So many great tricks once you read the docs

  • @sarangkharpate5780
    @sarangkharpate5780 2 роки тому

    Awesome tips, data type trick is bonkers ,😀

  • @ashrafibrahim3601
    @ashrafibrahim3601 2 роки тому +3

    I've honestly never had enough data to get an memory error for pandas. I really like this vid tho bc I do use pandas a bit and knowing this will help me if I ever work with huge datasets.

    • @jrwkc
      @jrwkc 2 роки тому

      start working with DNA

  • @ElinLiu0823
    @ElinLiu0823 2 роки тому

    Useful buta.....
    If i were really just wanna make a quick view on an dataset and i dont know what strcuture in this.
    How should i do?

  • @seventyfive7597
    @seventyfive7597 6 місяців тому

    I know it's an old vid, but how can I limit the string? Say I know the max len, how to avoid over allocation? More importantly using a "non-growing" str, (dynamic allocation is performance hell), how to specify it anyone knows?

  • @abhisek330
    @abhisek330 2 роки тому

    How to reduce the font size of the cells output( only)

  • @irodrigoarias
    @irodrigoarias 2 роки тому +1

    Just by adding the correct type of each column I could drop the memory usage by almost 50%. Thanks!!!

  • @LcTheSecond
    @LcTheSecond Рік тому

    I have a problem with my data.
    I have losts of dataframes (excel files)
    each from a dirent vendor.
    All with product descriptions (code, name, size, color, price etc)
    Problem is.
    it is not a fixed pattern.
    All vendors give me (daily) their own excel files.
    But they do not have all parammeters alike.
    For example some have color column others dont.
    For context.
    im using django.
    My goal is having a Product model with all attributes but only create or update those informations given by the vendor.
    First time. While creating (bulk create)
    i add all fields. and set a default for those missing.
    But when updating. I should be able to update only the fields with new diferent values, like price. Since descritions should be never changing, other wise it would be a new Product.
    I started with a simple code. Looping.
    and for a 2.000row excel file.
    takes 15min to check all info and handle each field based on a preset conditions

  • @stargazer8465
    @stargazer8465 2 роки тому +3

    Pretty awesome, I really liked specifying the data types. Reducing by an order of magnitude is fantastic

  • @milanwillaert1780
    @milanwillaert1780 2 роки тому +3

    I agree for most of your talk but the choice for int16 seems a little risky. With a maximum positive value of 32767, this is less than a factor ten away from the maximum of the sample presented (4611). I would not feel safe when the maximum of the current data and the maximum of the type it is represented by is in the same order of magnitude, certainly when also running models on future data, which may be quite different from the current dataset. Therefore a int32 type seems better, although the uint variant is also applicable in this case and roughly doubles the usable data range since "units sold" should not be negative I presume.
    Kind regards

    • @sabiazinho
      @sabiazinho Рік тому

      I don't see it as a risk, the best way to figure out the size is getting the column max value based on that we can decide which size to use.

  • @redmastern576
    @redmastern576 2 роки тому

    So you are telling me. I haven’t used Pandas to it’s full potential yet?

  • @mirimjam
    @mirimjam 2 роки тому

    This is very interesting, though if im being honest, everytime I actually ran into MemoryError with pandas, it was because I had made a stupid mistake and these tips wouldn't have helped much. Still, thanks for the tips.

  • @manoeljose3321
    @manoeljose3321 2 роки тому +1

    Great advice, given the fact that you can read a CSV file at once, because the file I need to read is so big that I can't even read it with Pandas directly.

    • @babsNumber2
      @babsNumber2 2 роки тому

      How do you read it then. I having a similar issue in some sales data I'm using.

    • @manoeljose3321
      @manoeljose3321 2 роки тому

      @@babsNumber2 I went low level: I used the io library to read row by row and created a limited size dataframe, therefore I can't read all of the rows, or my program will crash. I think the dask library might help you, as far as I know about it.

    • @babsNumber2
      @babsNumber2 2 роки тому

      @@manoeljose3321 Thank you very much. I'll check that out.

  • @rollingarchives
    @rollingarchives 2 роки тому

    i have the same record player 💙

  • @gkags2848
    @gkags2848 2 роки тому

    awesome, thanks a lot !

  • @AnthonyShaw
    @AnthonyShaw 2 роки тому

    Great tips