Love your content :) I have one small question.. At 4:10 Spill memory is of 137MB and Spill Disk is of 77.2MB. If 137MB is spilled from memory why only 77.2MB is written in disk? Shouldn't it be 137MB? Can you please clarify this?
Data written on disk are serialized and the data in memory is in deserialized format. Thus the amount will be less on disk. This is majir tradeoff when you are reading data from disks. Please make sure to share with your network if you love this content ❤️
Very well explained 👍 i have one doubt. Broadcast join also detects small df if broadcast join is enabled right? Do we need to specify which one is smaller df in broadcast join?
Beautiful and simple explanation of AQE! Loved how clearly you have written the commands on jupyter notebook.
Thank you!
Keep up the good work. 🙌
thanks 👍 Please make sure to share with your network.
Love your content :) I have one small question.. At 4:10 Spill memory is of 137MB and Spill Disk is of 77.2MB. If 137MB is spilled from memory why only 77.2MB is written in disk? Shouldn't it be 137MB? Can you please clarify this?
Data written on disk are serialized and the data in memory is in deserialized format. Thus the amount will be less on disk. This is majir tradeoff when you are reading data from disks.
Please make sure to share with your network if you love this content ❤️
@@easewithdata Thanks for the quick response!! Sure, will recommend my mates.
Very well explained 👍 i have one doubt. Broadcast join also detects small df if broadcast join is enabled right? Do we need to specify which one is smaller df in broadcast join?
Ideally no, but you can always specify if you want.
If you like my content, Please make sure to share with your network over LinkedIn 👍
After spark can you teach Airflow 🤗
Yes only if you Like Subscribe and Share this with your network 😉
El video suena interesante pero que duro es tratar de seguir o entender el inglés con esta pronunciación 😢 🤷🏻♂️
Please turn on subtitles to follow.