Question:
I am running a MapReduce job on Amazon EMR which creates 40 output files, about 130MB each. The 9 last reduce tasks fail with a “No space left on device” exception. Is this a matter of a false configuration of the cluster? The job runs without a problem with fewer input files, fewer output files and fewer reducers. Any help will be much appreciated. Thanks!
Full stacktrace below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Error: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at java.security.DigestOutputStream.write(DigestOutputStream.java:148) at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.write(MultipartUploadOutputStream.java:135) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83) at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92) at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:105) at java.io.FilterOutputStream.close(FilterOutputStream.java:160) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) |
EDIT
I did some further attempts but unfortunately I am still getting errors.
I thought I might not have enough memory on my instances because of the replication factor mentioned in the comment below, so I tried with large instead of medium instances that I was experimenting with until now. But I got another exception this time:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Error: java.io.IOException: Error closing multipart upload at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadMultiParts(MultipartUploadOutputStream.java:207) at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:222) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:160) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.util.concurrent.ExecutionException: com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received. (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) |
The result is that only about 70% of the expected output files are produced, the remaining reduce tasks fail. Then I tried uploading a large file to my S3 bucket in case there wasn’t enough memory there but that doesn’t seem to be the problem.
I am using the aws Elastic MapReduce service. Any ideas?
Answer:
The problem means that there is no space to store the output (or temporary output) of your MapReduce job.
Some things to check are:
- Have you deleted unnecessary files from HDFS? Run
hadoop dfs -ls /
command to check the files stored on HDFS. (In case you use a Trash, make sure you empty it, too.) - Do you use compression to store the output (or temporary output) of your jobs? You can do so by setting as output format the SequenceFileOutputFormat, or by setting
setCompressMapOutput(true);
- What is the replication factor? By default it is set to 3, but if there is a space issue, you can risk to set it to 2, or 1, in order to make your program run.
It could be an issue that some of your reducers output a significantly larger amount of data than others, so check your code, too.