How to install python packages within Amazon Sagemaker Processing Job?

Question:

I am trying to create a Sklearn processing job in Amazon Sagemekar to perform some data transformation of my input data before I do model training.

I wrote a custom python script preprocessing.py which does the needful. I use some python package in this script. Here is the Sagemaker example I followed.

When I try to submit the Processing Job I get an error –

I understand that my processing job is unable to find this package and I need to install it. My question is how can I accomplish this using Sagemaker Processing Job API? Ideally there should be a way to define a requirements.txt in the API call, but I don’t see such functionality in the docs.

I know I can create a custom Image with relevant packages and later use this image in the Processing Job, but this seems too much work for something that should be built-in?

Is there an easier/elegant way to install packages needed in Sagemaker Processing Job ?

Answer:

One way would be to call pip from Python:


Another way would be to use an SKLearn Estimator (training job) instead, to do the same thing. You can provide the source_dir, which can include a requirements.txt file, and these requirements will be installed for you

Leave a Reply