AWS Tutorials - Glue Workflow - Sharing States between Glue Jobs

Поділитися
Вставка
  • Опубліковано 6 вер 2024
  • The exercise URL - aws-dojo.com/e...
    AWS Glue Workflow help create complex ETL activities involving multiple crawlers, jobs, and triggers. Each workflow manages the execution and monitoring of the components it orchestrates. The workflow records execution progress and status of its components, providing an overview of the larger task and the details of each step. The AWS Glue console also provides a visual representation of the workflow as a graph.
    In this exercise, you learn sharing state in Glue Workflow.

КОМЕНТАРІ • 25

  • @veerachegu
    @veerachegu 2 роки тому +1

    We are using job parameters for glue job level like which folder need to pickup from bucket and onec job done write the file in to next bucket and next job take the file from that S3 bucket write another etc . so orchestrate all glue jobs like series in to work flow we are not using properties in workflow

    • @veerachegu
      @veerachegu 2 роки тому +1

      So is it fine?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 роки тому +1

      yeah. If you don't have need to dynamically change those locations at the runtime - this approach is good.

    • @veerachegu
      @veerachegu 2 роки тому

      @@AWSTutorialsOnline we are giving dynamically in to glue job parameters itself so if you want to change location just go to that glue job and change the parameters is enough right instead of going through work flow parameters?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 роки тому

      @@veerachegu sounds ok.

  • @mitanshubaranwal8878
    @mitanshubaranwal8878 2 роки тому +1

    I want to bring data from two workflows that would run parallely. Can a workflow be dependent in two source workflows ?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 роки тому +1

      It is not possible to make a Glue Workflow dependent on another. You might have think a customer solution such as step function which is orchestrating these workflow. Then it is possible to run workflows in parallel and then converge to a 3rd workflow once previous two parallel executions finish.

    • @mitanshubaranwal8878
      @mitanshubaranwal8878 2 роки тому

      @@AWSTutorialsOnline Thank you

  • @klzo4785
    @klzo4785 3 роки тому +1

    Really nice!

  • @gayathrichakravarthy1056
    @gayathrichakravarthy1056 3 роки тому +1

    Very informative! Thanks for sharing the knowledge. I do have a suggestion, however. Would it be an idea to have links in your description pointing to the relevant previous videos (Glue workflow in this instance) so we can quickly catch up on topics new to us?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 роки тому

      Thanks for the tip!

    • @varra19
      @varra19 Рік тому

      @@AWSTutorialsOnline Did not get why spark.read.csv or dataframe from_options was not used to read the csv files?

  • @atsource3143
    @atsource3143 3 роки тому +1

    Thank you for this wonderful video.
    I have one question - Q) How can we set global variable in job?
    Thank you

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 роки тому

      global variables are set in workflow not at job level. However - jobs participating in workflow can read / write global variables. In workflow configuration screen, you can setup such variables in workflow run properties.

  • @vivekjacobalex
    @vivekjacobalex 3 роки тому +1

    this is really good video. i have 2 questions, 1) is it possible to export the entire account/service settings and import to another account? 2) can you do a video on aws quicksight ?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 роки тому

      for 1) if you are using CDK or Cloudformation to create the resources, you can move configuration from one environment to another
      2) What topics you want to cover for the quicksight?

  • @veerachegu
    @veerachegu 2 роки тому

    Properties is mandatory to use ? actually i am using three glue jobs in series for work flow event base so do we face any issues if not using proparties?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 роки тому

      You need property only if you want to share state between jobs in a Glue workflow. Otherwise - no need.

    • @veerachegu
      @veerachegu 2 роки тому

      @@AWSTutorialsOnline thanks for confirmation and here i have a one query how to implement mail alert weather success otle failure of glue workflow instead of enable the individual glue jobs

  • @gregdeardurff3774
    @gregdeardurff3774 2 роки тому +1

    Thank you for the tutorial! I see that you give the role Power User Access in the IAM role. What exact permissions does the glue role need to use this functionality? I have been having issues getting this to work and suspecting I have not given enough permissions to my Glue IAM Role...

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 роки тому +1

      AWS managed policy "AWSGlueServiceRole" should be enough. Try with it and then tweak permissions based on access error you get.

  • @durgarasane-kolapkar1842
    @durgarasane-kolapkar1842 Рік тому

    Hi, thanks for explaining this useful concept with an example. I used the exact python code given in dojo-exercises, but it throws error that the arguments workflow_name and workflow_run_id are required. Please suggest

  • @kiddo8714
    @kiddo8714 Рік тому

    Hello, thanks for the wonderful video. Have a doubt. I want to pass parameter values at run time for Glue jobs in a workflow. I have added the same argument at Workflow level also via 'Default run properties (optional)' where I gave a default value for the argument. When I run the workflow, Glue job doesn't seem to be considering the run-time parameter that I pass but is using the default value given at Workflow level. I checked the 'Input Arguments' in 'Run Details' of the glue job and the run-time parameter passed is shown as the value there but the output generated corresponds to Default value. Any idea what I'm doing wrong ?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Рік тому

      This is odd. It should take the workflow properties you passed when running. Have you tried with no-default value and check how it behaves?