How to read large file from Amazon S3?

Question:

I have a program which will read a textfile from Amazon s3, but the file is around 400M. I have increased my Heap size but i’m still getting the Java Heap Size error. So, I’m not sure if my code is correct or not. I’m using Amazon SDK for java and Guava to deal with the file stream.

Please help

I use this option for my JVM. -Xms512m -Xmx2g. I use ant to run the main program so I include the jvm option to ANT_OPTS as well. But it’s still not working.

Answer:

The point of InputSupplier — though you should be using ByteSource and CharSource these days — is that you should never have access to the InputStream from the outside, so you don’t have to remember to close it or not.

If you’re using an old version of Guava before ByteSource and CharSource were introduced, then this should be

If you’re using Guava 14, then this can be done more fluently as

That said: your file might be 400MB, but Java Strings are stored as UTF-16, which can easily double its memory consumption. You may either need lots more memory, or you need to figure out a way to avoid keeping the whole file in memory at once.

Leave a Reply