AWS Glue Job Input Parameters

Question:

I am relatively new to AWS and this may be a bit less technical question, but at present AWS Glue notes a maximum of 25 jobs permitted to be created. We are loading in a series of tables that each have their own job that subsequently appends audit columns. Each job is very similar, but simply changes the connection string source and target.

Is there a way to parameterize these jobs to allow for reuse and simply pass the proper connection strings to them? Or even possibly loop through a set connection strings in a master job that would call a child job passing the varying connection strings through?

Any examples or documentation would be most appreciated

Answer:

In the below example I present how to use Glue job input parameters in the code. This code takes the input parameters and it writes them to the flat file.

  1. Setting the input parameters in the job configuration.

enter image description here

  1. The code of Glue job

  1. There is also possible to provide input parameters during using boto3, CloudFormation or StepFunctions. This example shows how to do it by using boto3.

Useful links:

  1. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html
  2. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
  3. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
  4. https://docs.aws.amazon.com/step-functions/latest/dg/connectors-glue.html

Leave a Reply