1) 0:54 - not correct. DataSets and DataFrame has to be serialized and de-serialized as well, but since these APIs impose structure on data collection these processes could be faster. Overall RDDs provide more control to Spark in terms of data manipulations; 2) not all DataFrames could be cached; 3) UDFs could be converted into native JVM bytecode with help of Catalyst optimizer. You may use df.explain() to see something like "Generated code: Yes" or "Generated code: No" in the output
Hi bawana, I learned somewhere we cannot uncache the data but we can unpersist so we use persist more inplace of a cache. but here you mentioned we can uncache. I'm bit confused which is correct?
Hi Bhawna. Your videos have helped me immensely in my databricks journey and I've nothing but appreciation for your work. Just a humble request, could you also please make a video on Databricks Unity Catalog??
1) 0:54 - not correct. DataSets and DataFrame has to be serialized and de-serialized as well, but since these APIs impose structure on data collection these processes could be faster. Overall RDDs provide more control to Spark in terms of data manipulations;
2) not all DataFrames could be cached;
3) UDFs could be converted into native JVM bytecode with help of Catalyst optimizer. You may use df.explain() to see something like "Generated code: Yes" or "Generated code: No" in the output
Bucketing, salting are also good optimization techniques.
Thanks Bhawna, can you please make a video on monitoring and troubleshooting spark jobs via UI
Hi bawana,
I learned somewhere we cannot uncache the data but we can unpersist so we use persist more inplace of a cache. but here you mentioned we can uncache. I'm bit confused which is correct?
So nice its helps a lot
Please share this ppt that will help us
Hi Bhawna. Your videos have helped me immensely in my databricks journey and I've nothing but appreciation for your work.
Just a humble request, could you also please make a video on Databricks Unity Catalog??
Yes already done with a playlist in UC 😀
How can we optimize spark Dataframe write to CSV it takes lot of time when it's a big file. Thanks in advance
Mem ur voice like #Soote ko jga d
Hahhahha...yeah agree😂