How to Read Data from Azure Blob Storage using Python
Azure Blob Storage is a service that provides scalable, secure, and cost-effective cloud storage for unstructured data. You can use Azure Blob Storage to store and access large amounts of data, such as images, videos, documents, or logs.
If you are working with Python, you may want to read data from Azure Blob Storage and process it using libraries such as pandas or numpy. For example, you may want to read a CSV file from a blob container and perform some data analysis or manipulation on it.
In this post, we will show you how to read data from Azure Blob Storage using Python and the Azure Storage SDK for Python. We will use a simple example of reading a CSV file from a blob container and loading it into a pandas DataFrame.
Prerequisites
To follow this post, you need the following:
- An Azure account with an active subscription – create an account for free
- An Azure Storage account – create a storage account
- A blob container and a blob in the storage account – upload a blob
- Python 3.6+ and pip installed on your machine
- The Azure Storage SDK for Python installed on your machine – install the SDK
Read Data from Azure Blob Storage using Python
To read data from Azure Blob Storage using Python, you need to use the BlobClient class from the azure.storage.blob module. The BlobClient class allows you to interact with a single blob in a container.
To create an instance of the BlobClient class, you need to provide the following parameters:
- account_url: The URL of your storage account, such as https://mystorageaccount.blob.core.windows.net
- container_name: The name of the blob container that contains the blob
- blob_name: The name of the blob that you want to read
- credential: The credential to authenticate with the storage account, such as the account key or a shared access signature (SAS) token
Once you have an instance of the BlobClient class, you can use the download_blob method to download the blob data as a stream. The download_blob method returns an instance of the StorageStreamDownloader class, which has methods to read or write the stream data.
To read the stream data into a variable, you can use the readall method of the StorageStreamDownloader class. This method reads all of the stream data into memory and returns it as bytes.
Alternatively, you can use the readinto method of the StorageStreamDownloader class to write the stream data into an existing buffer, such as a file object or a bytearray. This method can be more efficient if you don’t want to load all of the stream data into memory.
Here is an example of how to read a CSV file from a blob container and load it into a pandas DataFrame:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# Import modules import pandas as pd from azure.storage.blob import BlobClient # Create a BlobClient object blob = BlobClient( account_url="https://mystorageaccount.blob.core.windows.net", container_name="mycontainer", blob_name="mycsvfile.csv", credential="myaccountkey" ) # Download the blob data as a stream stream = blob.download_blob() # Read the stream data into a pandas DataFrame df = pd.read_csv(stream) # Print the DataFrame print(df) |
Conclusion
Azure Blob Storage is a service that provides scalable, secure, and cost-effective cloud storage for unstructured data. You can use Python and the Azure Storage SDK for Python to read data from Azure Blob Storage and process it using libraries such as pandas or numpy.
In this post, we showed you how to read data from Azure Blob Storage using Python and the BlobClient class. We used a simple example of reading a CSV file from a blob container and loading it into a pandas DataFrame.
We hope this post has helped you understand and use Python to read data from Azure Blob Storage. If you have any questions or feedback, please leave a comment below.