I am doing benchmarks for Azure Data Lake and I am seeing about ~7.5 MB/S for a read of an ADL Store and a write to a VHD all in the same region. This is the case for PowerShell and C# with the code taken from the following examples:
PowerShell Code is from https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-get-started-powershell/
C# Code is from https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-get-started-net-sdk/
Are the above code samples acceptable for a benchmark test or will a new SDK be delivered that will enhance the throughput?
Also, are there expected throughput numbers when ADL Store becomes generally available?
The code provided in the documentation can be used to build benchmark tests. The SDK will go through a few releases and updates prior to Azure Data Lake being generally available. These will include performance improvements in addition to features.
On the topic of performance benchmarks, our general guidance is as follows. The Azure Data Lake services are currently in preview. We are continually working to improve the services including performance through this preview phase. As we get closer to general availability, we will consider releasing additional guidance on the type of performance results to expect. Performance results depend heavily on many factors such as test topology, configuration and workload. Therefore it is difficult to comment your observations without examining all of these. If you can reach us offline with the details, we will be happy to take a look.
Amit Kulkarni (Program Manager – Azure Data Lake)