Build a Spark pipeline to analyze streaming data using AWS Glue, Apache Hudi, S3 and Athena

Поділитися
Вставка
  • Опубліковано 1 січ 2025

КОМЕНТАРІ • 10

  • @sagarsumit
    @sagarsumit 2 роки тому +2

    These tutorials are great. Sometimes, even helping me reproduce an issue! Keep it coming!

  • @showmethemoney824
    @showmethemoney824 3 місяці тому

    the github link doesn't exists

  • @CharlyRoseroC
    @CharlyRoseroC Рік тому +1

    hey Soumil than you for sharing all this knowledge. what do you think about the integration (if it's posible) with the data warehouse redshift? they're are compatible in a possible data solution?

  • @mennagamea4634
    @mennagamea4634 Рік тому

    isn't there a way so if the records is incoming with missing cols to be added with null without manually adding them using evolveSchema function? and also this is applied in case of deletion right, if we didn't excute evolveSchema and we applied delete records so the records are coming with only few cols, then we need to excute evolveSchema?

  • @chandnigupta-rn9yx
    @chandnigupta-rn9yx Рік тому

    how can i solve this error StreamingQueryException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last):

  • @manishdaga
    @manishdaga Рік тому

    Do we need to create hudi table before running the job

    • @SoumilShah
      @SoumilShah  Рік тому

      You mean glue ?

    • @manishdaga
      @manishdaga Рік тому

      @@SoumilShah I want to replicate same as you have you have created one table to get kineses raw data into that table and Hudi table you have't created so i also done the same but giving error as table does not exists

  • @manishdaga
    @manishdaga Рік тому

    Soumil i am geeting error as
    ERROR:py4j.java_gateway:There was an exception while executing the Python Proxy on the Python Side.
    Table or view not found: real_time_streams.nms_hudi; line 1 pos 14;
    'GlobalLimit 0
    +- 'LocalLimit 0
    +- 'Project [*]
    +- 'UnresolvedRelation [real_time_streams, nms_hudi], [], false