Efficiently move many small files to Amazon S3

Question:

I have around 60,000 small image files (total size 200mb) that I would like to move out of my project repository to Amazon S3.

I have tried s3fs (http://code.google.com/p/s3fs/), mounting S3 via Transmit on Mac OS X as well as the Amazon AWS S3 web uploader. Unfortunately it seems like all of these would take a very long time, more than a day or two, to accomplish the task.

Is there any better way?

Answer:

There are a few things that could be limiting the flow of data and each has a different way to alleviate it:

  1. Your transfer application might be adding overhead. If s3fs is too slow, you might try other options like the S3 tab on the AWS console or a tool like s3cmd.
  2. The network latency between your computer and S3 and the latency in API call responses can be a serious factor in how much you can do in a single thread. The key to solving this is to upload multiple files (dozens) in parallel.
  3. You could just have a slow network connection between you and S3, placing a limit on the total data transfer speed possible. If you can compress the files, you could upload them in compressed form to a temporary EC2 instance and then uncompress and upload from the instance to S3.

My bet is on number 2 which is not always the easiest to solve unless you have upload tools that will parallelize for you.

Leave a Reply