Simple 5 Steps Guide to get started with Apache Hudi and Glue 4.0 and query the data using Athena

Поділитися
Вставка
  • Опубліковано 15 жов 2024

КОМЕНТАРІ • 15

  • @wistyroamlands7495
    @wistyroamlands7495 Рік тому

    I'm so glad you're still making videos. :) I wish you luck in your field of choice and I hope things are going well for you. Thanks for your contributions to society.

  • @tomyanth
    @tomyanth Рік тому

    Thanks. It is very clear and I manage to repeat this in AWS.

  • @KartikGautam
    @KartikGautam 2 місяці тому

    Hi Soumil,
    I am unable to access the pdf can you help me with that. Thanks

  • @chetancc
    @chetancc Рік тому

    Hi Soumil,
    Thanks for sharing. It would be really useful. God bless you.
    Thanks,
    Chetan from Kandivali, Mumbai, India :)

  • @MohamedFazanNismy
    @MohamedFazanNismy 5 місяців тому

    Thanks Soumil , if I open the file it shows 'page not found'

  • @lezaaarman6964
    @lezaaarman6964 Рік тому

    I'm so close to transitioning to Hudi tables, but there's ONE missing feature that I is a blocker:
    Do you know what's the best practice to replace the glue job bookmark feature ?
    I'm actually building my own bookmarking capability to add to my new glue jobs using Hudi (by replicating what the original glue job bookmark does), but is it the best approach ?
    My source data is always being pushed to S3, so I don't have the option of using a streaming job by connecting to a kinesis stream, I just want to use the S3 bucket only as source.
    Thanks

  • @MunindherReddy
    @MunindherReddy Рік тому

    Does HUDI tables will not allow $ or specials characters on table column names?

  • @MaheshWankhede-u2o
    @MaheshWankhede-u2o 9 місяців тому

    Great video. Can you provide the links for jar files used for above script.

  • @AuroraNabi
    @AuroraNabi Рік тому

    where can i find the Hudi MOR table glue job script ? Is it uploaded ?i have checked your Github but couldn't find much

    • @SoumilShah
      @SoumilShah  Рік тому

      Simply change setting to MOR there is table type option

  • @adarshverma5429
    @adarshverma5429 Рік тому +1

    I am getting an error like, failed to upsert for commit time 202303022121655469 while writing the data. Please help me out to resolve this issue.

  • @joegenshlea6827
    @joegenshlea6827 Рік тому

    Great video. I really enjoy your positive energy and passion for the topic.
    As a NB in Pyspark and AWS it would be very nice if you could walk through the code just a touch more. I'm curious about how those parameters in the job setup get injected into the job. I'm also curious about this function:
    def create_spark_session():
    spark = SparkSession \
    .builder \
    .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
    .getOrCreate()
    return spark
    The boiler plate script has this line to instantiate a spark session:
    spark = glueContext.spark_session
    What is gained by your technique?

    • @SoumilShah
      @SoumilShah  Рік тому

      Hie thanks for suggestion there are hudi labs let me share links for those

    • @SoumilShah
      @SoumilShah  Рік тому

      Hey here is link for beginners
      ua-cam.com/play/PLxSSOLH2WRMO3Vz6qp_S3KhDqUbro1PqG.html