a trick also to make plotly graph faster when using polars is calculate in polars before putting into plotly.express. For example, instead of using px.histogram(df, ...). you can" val_counts = df.value_counts() fig = px.bar( x=val_counts[target], y=val_counts["count"], text_auto=True, )" this significantly decrease the plot generation time as well for large dataset.
Most data scientists/analysists working in small-to-mid size companies really don't need polars speed, because the datasets they have are not large enough (GBs+) yet where it'd make any noticeable difference.
@@TntpkerThis may be true, but anecdotally I've heard from clients that they've switched to Polars because of the expressive syntax rather than because of performance
@@Tntpker agree, most of business RDBMs sizes are less 10Gb when properly configured, yet you need custom Amazon instance to spawn managed PostgreSQL) it's not just pandas, those giga-optimisations everywhere)
a trick also to make plotly graph faster when using polars is calculate in polars before putting into plotly.express. For example, instead of using px.histogram(df, ...). you can" val_counts = df.value_counts()
fig = px.bar(
x=val_counts[target],
y=val_counts["count"],
text_auto=True,
)"
this significantly decrease the plot generation time as well for large dataset.
Thank you. Good advice 💪
What version of Panda (and polars) were used?
According to the GitHub requirements.txt file it was: pandas==2.2.1 and polars==0.20.13
Thank you for nice tutorial on comparison. May I know where we can see the code for above
Good question, Karthik. I just added it to the video description section. Also here: github.com/CBell045/dash-polars-pandas
Now include rapids cudf
Everyone will still use pandas because of 3rd-party libraries, the problem is not performance, it's an ecosystem.
Which library do you wish supported Polars?
Most data scientists/analysists working in small-to-mid size companies really don't need polars speed, because the datasets they have are not large enough (GBs+) yet where it'd make any noticeable difference.
@@TntpkerThis may be true, but anecdotally I've heard from clients that they've switched to Polars because of the expressive syntax rather than because of performance
@@Tntpker agree, most of business RDBMs sizes are less 10Gb when properly configured, yet you need custom Amazon instance to spawn managed PostgreSQL) it's not just pandas, those giga-optimisations everywhere)