Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir
Very nicely explained. But @8:40 you mentioned that the column 2 can go in the different same block or different block and @11:29 you mentioned that Spark knows that column 2 is stored in Block-2. These 2 statements are sort of causing confusion. i.e., if a column of each row can be spread across multiple blocks how does Spark know which block to search ?
One scenario i can think of,Avro over plain json offers benefits like schema evolution which can be beneficial in case of multiple producers and consumers setup. If you are using json data format with kafka topics in a data pipeline, avro format can be leveraged instead of json.
@@sumitnekar8965could you explain further please? I’m doing some investigation work on choosing avro v parquet v delta tables for Azure Event Hubs output, your explanation would be appreciated 🙏
Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir
Very nicely explained. But @8:40 you mentioned that the column 2 can go in the different same block or different block and @11:29 you mentioned that Spark knows that column 2 is stored in Block-2. These 2 statements are sort of causing confusion. i.e., if a column of each row can be spread across multiple blocks how does Spark know which block to search ?
Great explanation, to the point no exaggeration. Thanks for the video
sql example should have used aggregates. Probably that’s where columnar storage shines
Sir, you are genius!!
Pleasure do make a video on compression techniques
So, why is Avro used in some projects?
One scenario i can think of,Avro over plain json offers benefits like schema evolution which can be beneficial in case of multiple producers and consumers setup. If you are using json data format with kafka topics in a data pipeline, avro format can be leveraged instead of json.
@@sumitnekar8965could you explain further please? I’m doing some investigation work on choosing avro v parquet v delta tables for Azure Event Hubs output, your explanation would be appreciated 🙏
In Block 1 R3C3 is mentioned as wrong 🚫, this must be R2C3. Because R3C3 is coming in Block 2 as expected.
As a mathematician I must inform you that having a row space vs a column space is an isomorphism. There is no difference; it's in your head.