Making PySpark code faster with DuckDB

Поділитися
Вставка
  • Опубліковано 25 січ 2025

КОМЕНТАРІ • 4

  • @ryguyrg
    @ryguyrg Рік тому +1

    Another awesome video Mehdi! Love the animations! ❤

  • @Gretschi
    @Gretschi Рік тому +1

    Very interesting topic! Im currently writing my master thesis about PySpark Performance Optimization on Kubernetes regarding Spark configuration parameters 👍 will also take a look at duckdb

  • @tosinadekunle646
    @tosinadekunle646 8 місяців тому

    Do we still need to install Java, Hadoop and modify the environment variables on the local machine to do this or we just install DuckDB and pip install pyspark and start using sparksession and sparkcontext?
    Thank you.

    • @motherduckdb
      @motherduckdb  8 місяців тому

      It's an API translation, so meaning you can write spark code, but the execution is done on DuckDB if you want. So in that case, no pyspark/java/hadoop needed. Hope it clarify!