Can we do Schema evolution in MOR type HUDI table? Also if we drop a column in our Database and dump it to an S3 using DMS, will Hudi adjust itself to this change in schema?
Hi Answer to your first question Yes you can do schema evolution in MOR Answer to question 2 Depends if you are using hive sync or if you are creating tables using DDL But you can define schema and evolve as shown in video
@@SoumilShahOkay, We tried schema evolution in our MOR Hudi table and we are able to add new column, change datatype of column, rename column. But when we delete a column, it gives this error - "org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'emp_salary' not found at org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:221)" No we are not using Athena. We are doing schema changes on the fly. df = spark.createDataFrame(data = datalist, schema = df_schema) ( df.write.format("org.apache.hudi") .options(**CombinedConfig) .mode("append") .save(f"s3a://{hudi_table_bucket}/{hudi_table_path}/{schema}/{table}") ) where df_schema is in this format - struct when we deleted the column emp_salary, we removed it from the df_schema struct.
when we apply this we sometimes get athena error that the column doesn't exist in the schema; the error is inconsistent though, it sometimes appears and other times it adds the col as expected and we can see it.. any idea why?
Very Useful Video, It really helped me in solving my issue after watching this vide. Thank you!
Glad it helped!
thank you, extremely useful
Thanks very much for this sharing. Is it possible to drop columns in hudi tables in aws?
How to handle the delete an existing column in table? Huri throwing errors while upserting that batch which does not have the column.
Link to the notebook is not working! Could you please share the new link?
Can we do Schema evolution in MOR type HUDI table? Also if we drop a column in our Database and dump it to an S3 using DMS, will Hudi adjust itself to this change in schema?
Hi
Answer to your first question
Yes you can do schema evolution in MOR
Answer to question 2
Depends if you are using hive sync or if you are creating tables using DDL
But you can define schema and evolve as shown in video
@@SoumilShahOkay, We tried schema evolution in our MOR Hudi table and we are able to add new column, change datatype of column, rename column. But when we delete a column, it gives this error - "org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'emp_salary' not found
at org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:221)"
No we are not using Athena. We are doing schema changes on the fly.
df = spark.createDataFrame(data = datalist, schema = df_schema)
(
df.write.format("org.apache.hudi")
.options(**CombinedConfig)
.mode("append")
.save(f"s3a://{hudi_table_bucket}/{hudi_table_path}/{schema}/{table}")
)
where df_schema is in this format - struct
when we deleted the column emp_salary, we removed it from the df_schema struct.
when we apply this we sometimes get athena error that the column doesn't exist in the schema; the error is inconsistent though, it sometimes appears and other times it adds the col as expected and we can see it.. any idea why?
Switch to Athena engine 3 to resolve issue 😀😀
@@SoumilShah I did but athena enginer 3 have an error in schema changes it always gives me an error of col is not in schema