Polars vs. Pandas vs. Tidyverse vs. data.table for Left Join of Data Frames

Поділитися
Вставка
  • Опубліковано 25 лис 2024

КОМЕНТАРІ • 7

  • @ahmedal-attar3478
    @ahmedal-attar3478 2 місяці тому +2

    Probably worth noting, Polars is quicker because it's multi-threaded and uses all the cores on the machine, were as Pandas is single threaded

    • @ekbphd3200
      @ekbphd3200  2 місяці тому

      Thank you for pointing that out! I appreciate it.

    • @paulselormey7862
      @paulselormey7862 2 місяці тому +1

      Nice take, benchmark must go beyond speed. How much resources are used (CPU, memory) to achieve the apparent faster speed?

    • @ekbphd3200
      @ekbphd3200  10 днів тому

      I’m not sure. I’ll have to analyze that next.

  • @gardnmi
    @gardnmi 2 місяці тому

    pandas has a join method. It's supposedly faster. You just have to set the join columns as the index before calling.

    • @ekbphd3200
      @ekbphd3200  2 місяці тому

      Thanks for the comment. However, I can't get join() to be faster than merge(), in fact join() is 4x slower than merge() in my code. In the pandas section of my code here:
      github.com/ekbrown/scripting_for_linguists/blob/main/Script_polars_pandas_left_join.py
      when I comment out my merge() line and uncomment the two set_index() lines and the join() line, it is 4x slower. If you get set_index() + join() to be quicker than merge(), please leave a reply with how. Thanks!