How to make a AWS Data Pipeline ShellCommandActivity Script execute a python file


I am working with an AWS Data Pipeline that has a ShellCommandActivity that sets the script uri to bash file located in a s3 bucket. The bash file copies a python script located in the same s3 bucket to a EmrCluster and then the script tries to execute that python script.

enter image description here

This is my pipeline export:

This is

This is

From the Stdout Log I get:

download: s3://project/bin/scripts/ to ./

From the Stdeer Log I get:

python: can’t open file ‘’: [Errno 2] No such file or directory

I have also tried replacing python ./ with python, but I get the same result.

How do I get my AWS Data Pipeline to execute my script.


When I set scriptUri to s3://project/bin/scripts/ I get the following errors

/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ line 1: author: command not found
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ line 2: import: command not found
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ line 3: import: command not found
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ line 4: import: command not found
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ line 5: import: command not found
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ line 7: print: command not found


Added the following line to

Then I received the following error:

error: line 6, in import boto3 ImportError: No module named boto3

using @franklinsijo ‘s advice I created a Bootstrap Action on the EmrCluster with the following value:


This is

This worked!!!!!!!


Configure ShellCommandActivity with

  • Pass the S3 Uri Path of the python file as the Script Uri.
  • Add the shebang line #!/usr/bin/env python in the
  • If any non-default python libraries are used in the script, install them on the target resource.
    • If runsOn is chosen, Add the installation commands as the bootstrap action for the EMR Resource.
    • If workerGroup is chosen, Install all the libraries on the Worker group before pipeline activation.

Use either pip or easy_install to install the python modules.

Leave a Reply