Very interesting topic! Im currently writing my master thesis about PySpark Performance Optimization on Kubernetes regarding Spark configuration parameters 👍 will also take a look at duckdb
Do we still need to install Java, Hadoop and modify the environment variables on the local machine to do this or we just install DuckDB and pip install pyspark and start using sparksession and sparkcontext? Thank you.
It's an API translation, so meaning you can write spark code, but the execution is done on DuckDB if you want. So in that case, no pyspark/java/hadoop needed. Hope it clarify!
Another awesome video Mehdi! Love the animations! ❤
Very interesting topic! Im currently writing my master thesis about PySpark Performance Optimization on Kubernetes regarding Spark configuration parameters 👍 will also take a look at duckdb
Do we still need to install Java, Hadoop and modify the environment variables on the local machine to do this or we just install DuckDB and pip install pyspark and start using sparksession and sparkcontext?
Thank you.
It's an API translation, so meaning you can write spark code, but the execution is done on DuckDB if you want. So in that case, no pyspark/java/hadoop needed. Hope it clarify!