I'm so glad you're still making videos. :) I wish you luck in your field of choice and I hope things are going well for you. Thanks for your contributions to society.
I'm so close to transitioning to Hudi tables, but there's ONE missing feature that I is a blocker: Do you know what's the best practice to replace the glue job bookmark feature ? I'm actually building my own bookmarking capability to add to my new glue jobs using Hudi (by replicating what the original glue job bookmark does), but is it the best approach ? My source data is always being pushed to S3, so I don't have the option of using a streaming job by connecting to a kinesis stream, I just want to use the S3 bucket only as source. Thanks
Great video. I really enjoy your positive energy and passion for the topic. As a NB in Pyspark and AWS it would be very nice if you could walk through the code just a touch more. I'm curious about how those parameters in the job setup get injected into the job. I'm also curious about this function: def create_spark_session(): spark = SparkSession \ .builder \ .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \ .getOrCreate() return spark The boiler plate script has this line to instantiate a spark session: spark = glueContext.spark_session What is gained by your technique?
I'm so glad you're still making videos. :) I wish you luck in your field of choice and I hope things are going well for you. Thanks for your contributions to society.
Hi Soumil,
I am unable to access the pdf can you help me with that. Thanks
Thanks. It is very clear and I manage to repeat this in AWS.
Thanks Soumil , if I open the file it shows 'page not found'
Hi Soumil,
Thanks for sharing. It would be really useful. God bless you.
Thanks,
Chetan from Kandivali, Mumbai, India :)
My pleasure
I am getting an error like, failed to upsert for commit time 202303022121655469 while writing the data. Please help me out to resolve this issue.
Great video. Can you provide the links for jar files used for above script.
I'm so close to transitioning to Hudi tables, but there's ONE missing feature that I is a blocker:
Do you know what's the best practice to replace the glue job bookmark feature ?
I'm actually building my own bookmarking capability to add to my new glue jobs using Hudi (by replicating what the original glue job bookmark does), but is it the best approach ?
My source data is always being pushed to S3, so I don't have the option of using a streaming job by connecting to a kinesis stream, I just want to use the S3 bucket only as source.
Thanks
Does HUDI tables will not allow $ or specials characters on table column names?
where can i find the Hudi MOR table glue job script ? Is it uploaded ?i have checked your Github but couldn't find much
Simply change setting to MOR there is table type option
Great video. I really enjoy your positive energy and passion for the topic.
As a NB in Pyspark and AWS it would be very nice if you could walk through the code just a touch more. I'm curious about how those parameters in the job setup get injected into the job. I'm also curious about this function:
def create_spark_session():
spark = SparkSession \
.builder \
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
.getOrCreate()
return spark
The boiler plate script has this line to instantiate a spark session:
spark = glueContext.spark_session
What is gained by your technique?
Hie thanks for suggestion there are hudi labs let me share links for those
Hey here is link for beginners
ua-cam.com/play/PLxSSOLH2WRMO3Vz6qp_S3KhDqUbro1PqG.html