@@rajasdataengineering7585, So far I have watched 9 out of the 22 videos in the "Databricks Performance Optimization" playlist. It is very detailed. Like it.
By default, it is cached at on-heap memory. But if off-heap memory is enabled and jvm memory(on-heap) is full, off-heap memory would be used for caching remaining partitions
Very good playlist which I have come across.. Could you please provide example with practical example because I was watching some videos regarding this and what I noticed was when we df.cache() then by default it is MEMORY_AND_DISK SER ..there was no just MEMORY_AND_DISK it was always SERIALIZED ..need to know the reason on this.
Hi Raja. I have one doubt. Cache - will store the data in memory means is it onheap memory ?? Persist - Will store the data in onheap and off heap both ?? Is it correct ??
Only few people have ability to teach in way that even novice can understand. Hats off to you.
Keep going !!!
Thank you for your encouraging words
can not agree more
you are the real raja bro , super
Thank you bro
Thank you for sharing your knowledge with us!
My pleasure! Thank you
You have very good way of explaining the concepts. Thank you!
Thank you Chetan
your videos are the best
Good 👍
Thank you! Cheers!
Your videos are making wonders!!
Thank you
Nice content sir
Thanks!
I found many videos on UA-cam regarding Cache and Persist, but nobody explain like the way you did...
Thank you Rahul
This is the explanation thank you for share the knowledge sir👏
Thanks and welcome
Best teacher!!! Thank you sir 🙏🏻
Thank you Turan
Raja, I really appreciate your explanation :)
Glad to hear that! Thanks for your comment
Great explaination 🎉
Glad it was helpful! Keep watching
You explained it so simply...
i hope will be able to explain to the interviewer the same way u did😅
Thank you! All the best!
Knowledge session
Thanks Kamal
this is too good . please keep doing. can you post on processing small file problem with spark?
Thanks 👍🏻
Sure will post a video for small file problem
But where and how do we define these? Can you please add a short demo?
Can you add the examples for creating persist in the description?
I guess you have at least an M.Tech. + M.Ed. degrees.
Expert in Spark and Amazing Teacher.
Sir, Tussi Grett Ho !
Thank you Pankaj! Hope you like the tutorial
@@rajasdataengineering7585, So far I have watched 9 out of the 22 videos in the "Databricks Performance Optimization" playlist. It is very detailed. Like it.
Glad you like it!
Hi Raja, u said that persist will use both memory and disk. Here memory means both on and off heap memory????
By default, it is cached at on-heap memory. But if off-heap memory is enabled and jvm memory(on-heap) is full, off-heap memory would be used for caching remaining partitions
Please make Video on Salting in Performance optimization
Sure will create a video on salting technique
Hi, I was asked to prepare for Spark for my next role in the same company I am working, Is this learning series enough ?
Hi, yes this is more than enough if you complete all these videos
Very good playlist which I have come across.. Could you please provide example with practical example because I was watching some videos regarding this and what I noticed was when we df.cache() then by default it is MEMORY_AND_DISK SER ..there was no just MEMORY_AND_DISK it was always SERIALIZED ..need to know the reason on this.
Hi Raja. I have one doubt.
Cache - will store the data in memory means is it onheap memory ??
Persist - Will store the data in onheap and off heap both ??
Is it correct ??
Yes that's correct. Cache always stores in memory but persist has flexibility of memory or disk
@@rajasdataengineering7585 memory means here onheap rgt and disk means offheap??
No onheap and offheap both are memory and disk is different. I have already posted a video on onheap vs offheap. Pls watch that video
@@rajasdataengineering7585 thank you 😊
Hi Sir, we want vidoe for performance issues and solutions while develope the notebook
what are the issue comes
great video sir! one question - is disc memory same as off heap memory?
No, off heap and in disc both are different. Off heap memory is part of RAM. on heap is controlled by jvm while off heap is controlled by os itself
Best Explanation. but i have 1 question like cache() is a transformation or action ?
Cache is an action
@@rajasdataengineering7585 No, cache is not an action.It is an transformation, please do try it out.
Try to make videos under 10 mins sir
Sure, will do
how to avoid the duplicate rows while joining large datasets
Drop_duplicates or distinct can be used to remove duplicates