I have thousands of small files (about 1 KB) to upload to S3 every minutes.
If I upload every file in the loop
“send my HTTP request – wait S3’s HTTP response – send next request – wait next response …”,
it cost lots of time because I have to wait 2 times of latency between S3 and my server.
Of course I already use HTTP Keep-Alive header.
So I try to send multiple HTTP requests without waiting for the corresponding (HTTP pipelining). I try to send 20 requests in a batch and wait for 20 response. I expected this may save much time because I can still sending request when the previous response on the way.
However, It doesn’t make the world better.
I send my 20 requests in about 200ms, then I try to receive the response.
I expected I can receive response as fast as I send requests after I receive first response, like this graph.
The facts is that after I received the first response, I have to wait about 300ms for every response. It doesn’t get any better compare to sending one request and receive one response.
Why I can’t shorten the time for pipelining technique?
Why S3 cost so much time for every request?
Does S3 support HTTP pipelining?
Amazon S3 sports parallelization to get around the latency issue for each request.
You can make hundreds of concurrent requests to S3 and upload large batches of files in a very short time period.