Performance Tuning in Spark

Поділитися
Вставка
  • Опубліковано 11 гру 2024

КОМЕНТАРІ • 11

  • @oldoctopus393
    @oldoctopus393 Рік тому +2

    1) 0:54 - not correct. DataSets and DataFrame has to be serialized and de-serialized as well, but since these APIs impose structure on data collection these processes could be faster. Overall RDDs provide more control to Spark in terms of data manipulations;
    2) not all DataFrames could be cached;
    3) UDFs could be converted into native JVM bytecode with help of Catalyst optimizer. You may use df.explain() to see something like "Generated code: Yes" or "Generated code: No" in the output

  • @CoolGuy
    @CoolGuy Рік тому

    Bucketing, salting are also good optimization techniques.

  • @EDWDB
    @EDWDB Рік тому

    Thanks Bhawna, can you please make a video on monitoring and troubleshooting spark jobs via UI

  • @krishnasai7550
    @krishnasai7550 4 місяці тому

    Hi bawana,
    I learned somewhere we cannot uncache the data but we can unpersist so we use persist more inplace of a cache. but here you mentioned we can uncache. I'm bit confused which is correct?

  • @tanushreenagar3116
    @tanushreenagar3116 11 місяців тому

    So nice its helps a lot

  • @AbhinavDairyFarm
    @AbhinavDairyFarm 6 місяців тому

    Please share this ppt that will help us

  • @AyushSrivastava-gh7tb
    @AyushSrivastava-gh7tb Рік тому

    Hi Bhawna. Your videos have helped me immensely in my databricks journey and I've nothing but appreciation for your work.
    Just a humble request, could you also please make a video on Databricks Unity Catalog??

    • @cloudfitness
      @cloudfitness  Рік тому +1

      Yes already done with a playlist in UC 😀

  • @stevedz5591
    @stevedz5591 Рік тому

    How can we optimize spark Dataframe write to CSV it takes lot of time when it's a big file. Thanks in advance

  • @RohitSharma-ny1oq
    @RohitSharma-ny1oq Рік тому

    Mem ur voice like #Soote ko jga d