How to launch and configure an EMR cluster using boto

Question:

I’m trying to launch a cluster and run a job all using boto.
I find lot’s of examples of creating job_flows. But I can’t for the life of me, find an example that shows:

  1. How to define the cluster to be used (by clusted_id)
  2. How to configure an launch a cluster (for example, If I want to use spot instances for some task nodes)

Am I missing something?

Answer:

Boto and the underlying EMR API is currently mixing the terms cluster and job flow, and job flow is being deprecated. I consider them synonyms.

You create a new cluster by calling the boto.emr.connection.run_jobflow() function. It will return the cluster ID which EMR generates for you.

First all the mandatory things:

Then we specify instance groups, including the spot price we want to pay for the TASK nodes:

Finally we start a new cluster:

We can also print the cluster ID if we care about that:

Leave a Reply