Row Format vs Column Format | Why Parquet is better than Avro | Why Columnar formats are preferred

Поділитися
Вставка
  • Опубліковано 4 гру 2024

КОМЕНТАРІ • 11

  • @PANKAJKUMAR-fe8zn
    @PANKAJKUMAR-fe8zn 5 місяців тому

    Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir

  • @MrSravan84
    @MrSravan84 Рік тому +2

    Very nicely explained. But @8:40 you mentioned that the column 2 can go in the different same block or different block and @11:29 you mentioned that Spark knows that column 2 is stored in Block-2. These 2 statements are sort of causing confusion. i.e., if a column of each row can be spread across multiple blocks how does Spark know which block to search ?

  • @SanjayKumar-rw2gj
    @SanjayKumar-rw2gj 5 місяців тому

    Great explanation, to the point no exaggeration. Thanks for the video

  • @evilgoogle6986
    @evilgoogle6986 Місяць тому

    sql example should have used aggregates. Probably that’s where columnar storage shines

  • @mustafabohra2070
    @mustafabohra2070 Місяць тому

    Sir, you are genius!!

  • @cheluveshab9525
    @cheluveshab9525 2 роки тому +1

    Pleasure do make a video on compression techniques

  • @sumanthb3280
    @sumanthb3280 2 роки тому +1

    So, why is Avro used in some projects?

    • @sumitnekar8965
      @sumitnekar8965 2 роки тому +1

      One scenario i can think of,Avro over plain json offers benefits like schema evolution which can be beneficial in case of multiple producers and consumers setup. If you are using json data format with kafka topics in a data pipeline, avro format can be leveraged instead of json.

    • @josephjoestar995
      @josephjoestar995 Рік тому

      @@sumitnekar8965could you explain further please? I’m doing some investigation work on choosing avro v parquet v delta tables for Azure Event Hubs output, your explanation would be appreciated 🙏

  • @nindersingh
    @nindersingh Рік тому +1

    In Block 1 R3C3 is mentioned as wrong 🚫, this must be R2C3. Because R3C3 is coming in Block 2 as expected.

  • @James-l5s7k
    @James-l5s7k Рік тому +1

    As a mathematician I must inform you that having a row space vs a column space is an isomorphism. There is no difference; it's in your head.