Getting “No space left on device” for approx. 10 GB of data on EMR m1.large instances

Question:

I am getting an error “No space left on device” when I am running my Amazon EMR jobs using m1.large as the instance type for the hadoop instances to be created by the jobflow. The job generates approx. 10 GB of data at max and since the capacity of a m1.large instance is supposed to be 420GB*2 (according to: EC2 instance types ). I am confused how just 10GB of data could lead to a “disk space full” kind of a message. I am aware of the possibility that this kind of an error can also be generated if we have completely exhausted the total number of inodes allowed on the filesystem but that is like a big number amounting to millions and I am pretty sure that my job is not producing that many files. I have seen that when I try to create an EC2 instance independently of m1.large type it by default assigns a root volume of 8GB to it. Could this be the reason behind the provisioning of instances in EMR also? Then, when do the disks of size 420GB get alloted to an instance?

Also, here is the output of of “df -hi” and “mount”

Answer:

With the help of @slayedbylucifer I was able to identify the problem was that the complete disk space is made available to the HDFS on the cluster by default. Hence, there is the default 10GB of space mounted on / available for local use by the machine. There is an option called --mfs-percentage which can be used (while using MapR distribution of Hadoop) to specify the split of disk space between the local filesystem and HDFS. It mounts the local filesystem quota at /var/tmp. Make sure that the option mapred.local.dir is set to a directory inside /var/tmp because that is where all the logs of the tasktracker attempts go in which can be huge in size for big jobs. The logging in my case was causing the disk space error. I set the value of --mfs-percentage to 60 and was able to run the job successfully thereafter.

Leave a Reply