Question:
I have thousands of small files (about 1 KB) to upload to S3 every minutes.
If I upload every file in the loop
“send my HTTP request – wait S3’s HTTP response – send next request – wait next response …”,
it cost lots of time because I have to wait 2 times of latency between S3 and my server.
Of course I already use HTTP Keep-Alive header.
So I try to send multiple HTTP requests without waiting for the corresponding (HTTP pipelining). I try to send 20 requests in a batch and wait for 20 response. I expected this may save much time because I can still sending request when the previous response on the way.
However, It doesn’t make the world better.
I send my 20 requests in about 200ms, then I try to receive the response.
I expected I can receive response as fast as I send requests after I receive first response, like this graph.
The facts is that after I received the first response, I have to wait about 300ms for every response. It doesn’t get any better compare to sending one request and receive one response.
Why I can’t shorten the time for pipelining technique?
Why S3 cost so much time for every request?
Does S3 support HTTP pipelining?
Thanks.
Answer:
Amazon S3 sports parallelization to get around the latency issue for each request.
You can make hundreds of concurrent requests to S3 and upload large batches of files in a very short time period.