Polars or Pandas -- Which is Faster?

Поділитися
Вставка
  • Опубліковано 14 жов 2024

КОМЕНТАРІ • 12

  • @PengZhang-ls5im
    @PengZhang-ls5im 6 місяців тому +5

    a trick also to make plotly graph faster when using polars is calculate in polars before putting into plotly.express. For example, instead of using px.histogram(df, ...). you can" val_counts = df.value_counts()
    fig = px.bar(
    x=val_counts[target],
    y=val_counts["count"],
    text_auto=True,
    )"
    this significantly decrease the plot generation time as well for large dataset.

  • @Radioguy00
    @Radioguy00 6 місяців тому +2

    What version of Panda (and polars) were used?

    • @CharmingData
      @CharmingData  6 місяців тому +3

      According to the GitHub requirements.txt file it was: pandas==2.2.1 and polars==0.20.13

  • @karthikb.s.k.4486
    @karthikb.s.k.4486 6 місяців тому

    Thank you for nice tutorial on comparison. May I know where we can see the code for above

    • @CharmingData
      @CharmingData  6 місяців тому +1

      Good question, Karthik. I just added it to the video description section. Also here: github.com/CBell045/dash-polars-pandas

  • @gtizzle101
    @gtizzle101 6 місяців тому

    Now include rapids cudf

  • @ordinarygg
    @ordinarygg 6 місяців тому +1

    Everyone will still use pandas because of 3rd-party libraries, the problem is not performance, it's an ecosystem.

    • @marco_gorelli
      @marco_gorelli 6 місяців тому

      Which library do you wish supported Polars?

    • @Tntpker
      @Tntpker 6 місяців тому +1

      Most data scientists/analysists working in small-to-mid size companies really don't need polars speed, because the datasets they have are not large enough (GBs+) yet where it'd make any noticeable difference.

    • @marco_gorelli
      @marco_gorelli 6 місяців тому +1

      @@TntpkerThis may be true, but anecdotally I've heard from clients that they've switched to Polars because of the expressive syntax rather than because of performance

    • @ordinarygg
      @ordinarygg 6 місяців тому

      @@Tntpker agree, most of business RDBMs sizes are less 10Gb when properly configured, yet you need custom Amazon instance to spawn managed PostgreSQL) it's not just pandas, those giga-optimisations everywhere)