Learn Schema Evolution in Apache Hudi Transaction Datalake with hands on labs

Поділитися
Вставка

КОМЕНТАРІ • 12

  • @balajis4788
    @balajis4788 Місяць тому

    Very Useful Video, It really helped me in solving my issue after watching this vide. Thank you!

  • @withtheengineer-hamza-3255
    @withtheengineer-hamza-3255 Рік тому

    thank you, extremely useful

  • @yulinshao8576
    @yulinshao8576 Рік тому +1

    Thanks very much for this sharing. Is it possible to drop columns in hudi tables in aws?

  • @harjeetsinghgoldy1
    @harjeetsinghgoldy1 10 місяців тому

    How to handle the delete an existing column in table? Huri throwing errors while upserting that batch which does not have the column.

  • @federicomanueldlouky5231
    @federicomanueldlouky5231 4 місяці тому

    Link to the notebook is not working! Could you please share the new link?

  • @vinjitsharma1875
    @vinjitsharma1875 Рік тому

    Can we do Schema evolution in MOR type HUDI table? Also if we drop a column in our Database and dump it to an S3 using DMS, will Hudi adjust itself to this change in schema?

    • @SoumilShah
      @SoumilShah  Рік тому +1

      Hi
      Answer to your first question
      Yes you can do schema evolution in MOR
      Answer to question 2
      Depends if you are using hive sync or if you are creating tables using DDL
      But you can define schema and evolve as shown in video

    • @vinjitsharma1875
      @vinjitsharma1875 Рік тому

      @@SoumilShahOkay, We tried schema evolution in our MOR Hudi table and we are able to add new column, change datatype of column, rename column. But when we delete a column, it gives this error - "org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'emp_salary' not found
      at org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:221)"
      No we are not using Athena. We are doing schema changes on the fly.
      df = spark.createDataFrame(data = datalist, schema = df_schema)
      (
      df.write.format("org.apache.hudi")
      .options(**CombinedConfig)
      .mode("append")
      .save(f"s3a://{hudi_table_bucket}/{hudi_table_path}/{schema}/{table}")
      )
      where df_schema is in this format - struct
      when we deleted the column emp_salary, we removed it from the df_schema struct.

  • @mennagamea4634
    @mennagamea4634 Рік тому

    when we apply this we sometimes get athena error that the column doesn't exist in the schema; the error is inconsistent though, it sometimes appears and other times it adds the col as expected and we can see it.. any idea why?

    • @SoumilShah
      @SoumilShah  Рік тому

      Switch to Athena engine 3 to resolve issue 😀😀

    • @mennagamea4634
      @mennagamea4634 Рік тому

      @@SoumilShah I did but athena enginer 3 have an error in schema changes it always gives me an error of col is not in schema