No space left on device exception, amazon EMR medium instances and S3

Question:

I am running a MapReduce job on Amazon EMR which creates 40 output files, about 130MB each. The 9 last reduce tasks fail with a “No space left on device” exception. Is this a matter of a false configuration of the cluster? The job runs without a problem with fewer input files, fewer output files and fewer reducers. Any help will be much appreciated. Thanks!
Full stacktrace below:

EDIT

I did some further attempts but unfortunately I am still getting errors.
I thought I might not have enough memory on my instances because of the replication factor mentioned in the comment below, so I tried with large instead of medium instances that I was experimenting with until now. But I got another exception this time:

The result is that only about 70% of the expected output files are produced, the remaining reduce tasks fail. Then I tried uploading a large file to my S3 bucket in case there wasn’t enough memory there but that doesn’t seem to be the problem.

I am using the aws Elastic MapReduce service. Any ideas?

Answer:

The problem means that there is no space to store the output (or temporary output) of your MapReduce job.

Some things to check are:

  • Have you deleted unnecessary files from HDFS? Run hadoop dfs -ls / command to check the files stored on HDFS. (In case you use a Trash, make sure you empty it, too.)
  • Do you use compression to store the output (or temporary output) of your jobs? You can do so by setting as output format the SequenceFileOutputFormat, or by setting setCompressMapOutput(true);
  • What is the replication factor? By default it is set to 3, but if there is a space issue, you can risk to set it to 2, or 1, in order to make your program run.

It could be an issue that some of your reducers output a significantly larger amount of data than others, so check your code, too.

Leave a Reply