AWS Simple Storage Service (S3) Vs Azure Blob Storage Vs GCP Cloud Storage
In the last blog post, we have discussed and compared different services/features available in AWS, Azure, and GCP.
In this blog post, we will discuss object storage options in AWS, Azure, and GCP. More specifically we will compare AWS Simple Storage Service (S3) Vs Azure Blob Storage Vs GCP Cloud Storage.
AWS Simple Storage Service (S3) Vs Azure Blob Storage Vs GCP Cloud Storage:
|Feature||AWS Simple Storage Service||Azure Blob Storage||GCP Cloud Storage|
|Definition||Object storage service that offers industry-leading scalability, data availability, security, and performance.||Massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing and machine learning.||Object storage for companies of all sizes. Store any amount of data. Retrieve it as often as you’d like.|
|Folder within buckets supported||Yes||Yes||Yes|
|Storage Classes||Storage Classes:
1. S3 Standard (default)
2. S3 Standard-IA
3. S3 Intelligent-Tiering
4. S3 One Zone-IA
5. S3 Glacier
6. S3 Glacier Deep Archive
7. RRS (not recommended)
1. Hot (default)
1. Standard Storage (default)
2. Nearline Storage
3. Coldline Storage
4. Archive Storage
|Data Redundancy||Region||Locally redundant storage (LRS)
Zone-redundant storage (ZRS)
Geo-redundant storage (GRS)
Geo-zone-redundant storage (GZRS)
Read-Access Geo-redundant storage (RA-GRS)
Read-Access Geo-zone-redundant storage (RA-GZRS).
|Encryption Options||Encryption Options:
1. Unencrypted (default)
2. Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
3. Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS KMS (SSE-KMS)
4. Server-Side Encryption with Customer-Provided Keys (SSE-C)
5. Client-Side Encryption with Customer Master Keys (CMKs) Stored in AWS KMS (CSE-KMS)
6. Client-Side Encryption with Client Side Key (CSE-C)
1. server-side encryption (SSE) with Microsoft-managed key (default)
2. server-side encryption (SSE) with customer-managed key
3. server-side encryption (SSE) with customer-provided key
4. client-side encryption (CSE)
1. server-side encryption (SSE) with Google-managed key
2. server-side encryption (SSE) with customer-supplied encryption key
3. server-side encryption (SSE) with customer-managed encryption key
4. client-side encryption (CSE)
|Object Lifecycle Management||Lifecycle rules||Lifecycle management||Object Lifecycle Management|
|Access Control||Bucket policies||Azure Active Directory (Azure AD)||Uniform bucket-level access (Formally know as bucket policy)|
|IAM user policies||Azure Active Directory (Azure AD)||Identity and Access Management|
|Access control list (ACL)||Azure Active Directory (Azure AD)||Access control lists (ACLs)|
|presigned URL||shared access signature (SAS)||Signed URLs|
|Object Version Control||Versioning||Blob versioning||Object Versioning|
|Logging & Monitoring||1. Server access logging
2. AWS CloudTrail data events
3. Event notifications
4. Bucket metrics
|1. Azure Storage analytics logging
2. Azure Monitor
3. Change feed support in Azure Blob Storage
|1. Usage logs
2. Cloud Audit Logs
3. Storage logs
4. Pub/Sub notifications for Cloud Storage
|Delete Protection||MFA delete||Soft delete for blobs||Object Hold, Retention policies|
|Object Replication||Replication rules||Object replication||NA|
|Static Web Hosting||Yes||Yes||Yes|
|Inventory||Inventory configurations||Azure Storage blob inventory||NA|
|Upload Acceleration||1. Multi-part upload
2. Transfer Accelaration
|NA||1. Resumable upload
2. Parallel composite uploads
|Requestor Pay Support||Yes||NA||Yes|
|Reserved Capacity (at low cost)||NA||Yes||NA|
|Composing Objects Support||NA||Append blobs||Composing Objects|
|Network File System (NFS) 3.0 protocol support||Yes||Yes using storage gateway||NA|
|Search Capability||Athena||Query Blob Contents||BigQuery|
Note: I have tried to compare the services to the best of my knowledge and ability. Hence, I would encourage a peer review from my cloud community if you think any comparison is factually not correct or any feature may be included in this list. Feel free to comment with a reference and I will update this blog accordingly.
Now, let’s dig down a bit and explore the features of each cloud service provider.
Key Concepts & Features:
AWS Simple Storage Service (S3):
Bucket: A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket.
Object: Objects are the fundamental entities stored in Amazon S3 and consist of object data and metadata
Regions: AWS S3 is a global service, but actual buckets are stored in a specific region.
Storage classes: The storage class defines durability, availability, accessibility, and cost of storing objects in an S3 bucket. The below table summarizes all storage classes available in AWS S3.
|Storage class||Designed for||Durability||Availability||Availability Zones||Min storage duration||Min billable object size||Other considerations|
|S3 Standard||Frequently accessed data||100.00%||99.99%||>= 3||None||None||None|
|S3 Standard-IA||Long-lived, infrequently accessed data||100.00%||99.90%||>= 3||30 days||128 KB||Per GB retrieval fees apply.|
|S3 Intelligent-Tiering||Long-lived data with changing or unknown access patterns||100.00%||99.90%||>= 3||30 days||None||Monitoring and automation fees per object apply. No retrieval fees.|
|S3 One Zone-IA||Long-lived, infrequently accessed, non-critical data||100.00%||99.50%||1||30 days||128 KB||Per GB retrieval fees apply. Not resilient to the loss of the Availability Zone.|
|S3 Glacier||Long-term data archiving with retrieval times ranging from minutes to hours||100.00%||99.99% (after you restore objects)||>= 3||90 days||40 KB||Per GB retrieval fees apply. You must first restore archived objects before you can access them.|
|S3 Glacier Deep Archive||Archiving rarely accessed data with a default retrieval time of 12 hours||100.00%||99.99% (after you restore objects)||>= 3||180 days||40 KB||Per GB retrieval fees apply. You must first restore archived objects before you can access them.|
|RRS (not recommended)||Frequently accessed, non-critical data||99.99%||99.99%||>= 3||None||None||None|
Lifecycle rules: Use lifecycle rules to define actions you want Amazon S3 to take during an object’s lifetime such as transitioning objects to another storage class, archiving them, or deleting them after a specified period of time.
Bucket policies: Bucket policies provide centralized access control to buckets.
IAM user policies: You can create and configure IAM user policies for controlling user access to Amazon S3. User policies use JSON-based access policy language.
Access control list (ACL): Grant basic read/write permissions on object level.
Object ownership: Assume ownership of new objects uploaded to a S3 bucket.
Versioning: Use versioning to keep multiple versions of an object in the same bucket.
MFA delete: When working with S3 Versioning, you can optionally add another layer of security by configuring a bucket to enable MFA delete which enforce two factor authentication before the object can be deleted.
Server access logging: Log requests for access to your bucket.
Encryption: Encrypt objects stored in your bucket, set in bucket level for server-side encryption. AWS S3 provides five options for encryption as listed below.
|Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)||When you use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), each object is encrypted with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates.|
|Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service (SSE-KMS)||Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service (SSE-KMS) is similar to SSE-S3, but with some additional benefits and charges for using this service. There are separate permissions for the use of a CMK that provides added protection against unauthorized access of your objects in Amazon S3.|
|Server-Side Encryption with Customer-Provided Keys (SSE-C)||With Server-Side Encryption with Customer-Provided Keys (SSE-C), you manage the encryption keys and Amazon S3 manages the encryption, as it writes to disks, and decryption, when you access your objects.|
|Client-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service (CSE-KMS)||With this option, you use an AWS KMS CMK for client-side encryption when uploading or downloading data in Amazon S3.|
|Client-Side Encryption with Client Side Key (CSE-C)||With this option, you use a master key that is stored within your application for client-side data encryption.|
Tag: Track storage cost or other criteria by tagging your bucket.
AWS CloudTrail data events: Configure CloudTrail data events to log Amazon S3 object-level API operations in the CloudTrail console.
Event notifications: Send a notification when specific events occur in your bucket.
Transfer acceleration: Use an accelerated endpoint for faster data transfers.
Requester pays: When enabled, the requester pays for requests and data transfer costs, and anonymous access to S3 bucket is disabled.
Static website hosting: Use S3 bucket to host a website or redirect requests.
Cross-origin resource sharing (CORS): The CORS configuration, written in JSON, defines a way for client web applications that are loaded in one domain to interact with resources in a different domain.
Bucket metrics: Explore metrics for usage, request, and data transfer activity within your bucket. Metrics are also available in Amazon CloudWatch.
Storage Class Analysis: Analyze storage access patterns to help you decide when to transition objects to the appropriate storage class.
Replication metrics: Monitor the total number and size of objects that are pending replication, and the maximum replication time to the destination Region.
Replication rules: Use replication rules to define options you want Amazon S3 to apply during replication such as server-side encryption, replica ownership, transitioning replicas to another storage class, and more.
Inventory configurations: You can create inventory configurations on a bucket to generate a flat file list of your objects and metadata.
Access Points: Amazon S3 Access Points simplify managing data access at scale for shared datasets in S3. Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations.
Custom Object Metadata (optional): Metadata is optional information provided as a name-value (key-value) pair.
presigned URL: All objects and buckets are private by default. However, you can use a presigned URL to optionally share objects or enable your customers/users to upload objects to buckets without AWS security credentials or permissions.
Multipart Upload: Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object’s data. You can upload these object parts independently and in any order.
Query S3 Data Using SQL (Athena): Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Azure Blob Storage:
Storage accounts: A storage account provides a unique namespace in Azure for your data. Every object that you store in Azure Storage has an address that includes your unique account name.
Containers: A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs.
Blobs: Blob stands for Binary Large Object, which includes objects such as images and multimedia files. These are known as unstructured data because they don’t follow any particular data model. Azure Storage supports three types of blobs as listed below –
|Block blobs||store text and binary data. Block blobs are made up of blocks of data that can be managed individually. Block blobs can store up to about 190.7 TiB.|
|Append blobs||are made up of blocks like block blobs, but are optimized for append operations. Append blobs are ideal for scenarios such as logging data from virtual machines.|
|Page blobs||store random access files up to 8 TiB in size. Page blobs store virtual hard drive (VHD) files and serve as disks for Azure virtual machines.|
Access tiers: Azure storage offers different access tiers, allowing you to store blob object data in the most cost-effective manner.
|Hot||Optimized for storing data that is accessed frequently.|
|Cool||Optimized for storing data that is infrequently accessed and stored for at least 30 days.|
|Archive||Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours.|
Performance tiers: Azure block blob storage offers two different performance tiers.
|Premium||optimized for high transaction rates and single-digit consistent storage latency|
|Standard||optimized for high capacity and high throughput|
Lifecycle management: Azure Blob Storage lifecycle management offers a rich, rule-based policy which you can use to transition your data to the best access tier and to expire data at the end of its lifecycle.
Reserved capacity: Save money on storage costs for blob data with Azure Storage reserved capacity. Azure Storage reserved capacity offers you a discount on capacity for block blobs and for Azure Data Lake Storage Gen2 data in standard storage accounts when you commit to a reservation for either one year or three years.
Index Tag: Blob index tags provide data management and discovery capabilities by using key-value index tag attributes. You can categorize and find objects within a single container or across all containers in your storage account. As data requirements change, objects can be dynamically categorized by updating their index tags.
Blob indexer: An indexer is a data-source-aware subservice in Cognitive Search, equipped with internal logic for sampling data, reading metadata data, retrieving data, and serializing data from native formats into JSON documents for subsequent import.
Azure Storage blob inventory: The Azure Storage blob inventory feature provides an overview of your blob data within a storage account. Use the inventory report to understand your total data size, age, encryption status, and so on.
Blob Storage using Azure Monitor: Azure Blob Storage creates monitoring data by using Azure Monitor, which is a full stack monitoring service in Azure. Azure Monitor provides a complete set of features to monitor your Azure resources and resources in other clouds and on-premises.
Network File System (NFS) 3.0 protocol support: Blob storage supports the Network File System (NFS) 3.0 protocol. This support provides Linux file system compatibility at object storage scale and prices and enables Linux clients to mount a container in Blob storage from an Azure Virtual Machine (VM) or a computer on-premises.
Azure Blob Storage redundancy: Azure Storage by default, always stores multiple copies of your data so that it is protected from planned and unplanned events. But you can control the redundancy using the below options –
|Locally redundant storage (LRS)||Copies your data synchronously three times within a single physical location in the primary region.|
|Zone-redundant storage (ZRS)||Copies your data synchronously across three Azure availability zones in the primary region.|
|Geo-redundant storage (GRS)||Copies your data synchronously three times within a single physical location in the primary region using LRS. It then copies your data asynchronously to a single physical location in the secondary region. Within the secondary region, your data is copied synchronously three times using LRS.|
|Geo-zone-redundant storage (GZRS)||Copies your data synchronously across three Azure availability zones in the primary region using ZRS. It then copies your data asynchronously to a single physical location in the secondary region. Within the secondary region, your data is copied synchronously three times using LRS.|
|Read-Access Geo-redundant storage (RA-GRS)||Same as Geo-redundant storage (GRS) with read access enabled in secondary region|
|Read-Access Geo-zone-redundant storage (RA-GZRS).||Same as Geo-zone-redundant storage (GZRS) with read access enabled in secondary region|
Blob versioning: You can enable Blob storage versioning to automatically maintain previous versions of an object. When blob versioning is enabled, you can restore an earlier version of a blob to recover your data if it is erroneously modified or deleted.
Soft delete for blobs: Blob soft delete protects an individual blob, snapshot, or version from accidental deletes or overwrites by maintaining the deleted data in the system for a specified period of time.
Point-in-time restore for block blobs: Point-in-time restore provides protection against accidental deletion or corruption by enabling you to restore block blob data to an earlier state.
Change feed support in Azure Blob Storage: The purpose of the change feed is to provide transaction logs of all the changes that occur to the blobs and the blob metadata in your storage account. The change feed provides ordered, guaranteed, durable, immutable, read-only log of these changes.
Immutable Blob: Immutable storage for Azure Blob storage enables users to store business-critical data objects in a WORM (Write Once, Read Many) state. This state makes the data non-erasable and non-modifiable for a user-specified interval.
Object replication: Object replication asynchronously copies block blobs between a source storage account and a destination account.
Blob Access Control: Microsoft Azure provides multiple options to control access to your blob storage –
|Blob Access Control||Description|
|Azure Active Directory (Azure AD) integration for blobs||Azure provides Azure role-based access control (Azure RBAC) for control over a client’s access to blob.|
|Shared Key authorization for blobs||A client using Shared Key passes a header with every request that is signed using the storage account access key.|
|shared access signature (SAS)||shared access signature (SAS) is a URI that grants restricted access to an Azure Storage container. Use it when you want to grant access to storage account resources for a specific time range without sharing your storage account key.|
|Public access||A container and its blobs may be publicly available. When you specify that a container or blob is public, anyone can read it anonymously; no authentication is required.|
Cross-Origin Resource Sharing (CORS): Azure storage services support Cross-Origin Resource Sharing (CORS) for the Blob, Table, and Queue services.
Encryption: All Azure Storage uses server-side encryption (SSE) by default to automatically encrypt your data when it is persisted to the cloud and you cannot disable this encryption. In addition, you can use additional encryption options as listed below –
|server-side encryption (SSE) with Microsoft-managed key||Enabled by default to encrypt all azure storage, you don’t have any control over it.|
|server-side encryption (SSE) with customer-managed key||You can specify a customer-managed key to use for encrypting and decrypting data in Blob storage. Customer-managed keys must be stored in Azure Key Vault or Azure Key Vault Managed Hardware Security Model (HSM) (preview).|
|server-side encryption (SSE) with customer-provided key||You can specify a customer-provided key on Blob storage operations. A client making a read or write request against Blob storage can include an encryption key on the request for granular control over how blob data is encrypted and decrypted.|
|client-side encryption (CSE)||The Azure Storage client libraries provide methods for encrypting data from the client library before sending it across the wire and decrypting the response. Data encrypted via client-side encryption is also encrypted at rest by Azure Storage.|
GCP Cloud Storage:
Projects: All data in Cloud Storage belongs inside a project. A project consists of a set of users, a set of APIs, and billing, authentication, and monitoring settings for those APIs.
Buckets: Buckets are the basic containers that hold your data. Everything that you store in Cloud Storage must be contained in a bucket.
Objects: Objects are the individual pieces of data that you store in Cloud Storage. There is no limit on the number of objects that you can create in a bucket.
Object Lock: Store objects using a write-once-read-many (WORM) model to help you prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
Bucket labels: Bucket labels are key:value metadata pairs that allow you to group your buckets along with other Google Cloud resources such as virtual machine instances and persistent disks.
Object Versioning: To support the retrieval of objects that are deleted or replaced, Cloud Storage offers the Object Versioning feature.
Bucket locations: You specify a location for storing your object data when you create a bucket. In GCP there are different options available to place your bucket as listed below –
|region||a specific geographic place|
|dual-region||a specific pair of regions|
|multi-region||a large geographic area, that contains two or more geographic places.|
Storage classes: Storage class defines availability, accessibility and cost of storing objects in cloud storage.
|Storage Class||Usage||Minimum storage duration||Typical monthly availability|
|Standard Storage||best for data that is frequently accessed (“hot” data) and/or stored for only brief periods of time.||None||>99.99% in multi-regions and dual-regions
99.99% in regions
|Nearline Storage||a low-cost, highly durable storage service for storing infrequently accessed data.||30 days||99.95% in multi-regions and dual-regions
99.9% in regions
|Coldline Storage||a very-low-cost, highly durable storage service for storing infrequently accessed data.||90 days||99.95% in multi-regions and dual-regions
99.9% in regions
|Archive Storage||is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery.||365 days||99.95% in multi-regions and dual-regions
99.9% in regions
Requester Pays: With Requester Pays enabled on your bucket, you can require requesters to include a billing project in their requests, thus billing the requester’s project.
Bucket metadata: Buckets created in Cloud Storage have metadata associated with them. Metadata identifies properties of the bucket and specifies how the bucket should be handled when it’s accessed.
Object metadata: Objects stored in Cloud Storage have metadata associated with them. Metadata identifies properties of the object, as well as specifies how the object should be handled when it’s accessed.
Resumable upload: Use this for a more reliable transfer, which is especially important with large files. Resumable uploads are a good choice for most applications, since they also work for small files at the cost of one additional HTTP request per upload.
Parallel composite uploads: One strategy for uploading large files is called parallel composite uploads. In such an upload, a file is divided into up to 32 chunks, the chunks are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted.
Composing Objects: You can create a composite object from existing objects without transferring additional object data. Composite objects are useful for making appends to an existing object, as well as for recreating objects that you uploaded as multiple components in parallel. You can compose between 1 and 32 source objects in a single request.
Retention policies: You can include a retention policy when creating a new bucket, or you can add a retention policy to an existing bucket. Placing a retention policy on a bucket ensures that all current and future objects in the bucket cannot be deleted or replaced until they reach the age you define in the retention policy.
Retention policy locks: When you lock a retention policy on a bucket, you prevent the policy from ever being removed or the retention period from ever being reduced (although you can still increase the retention period). Once a retention policy is locked, you cannot delete the bucket until every object in the bucket has met the retention period. Locking a retention policy is irreversible.
Object Hold: Object holds are metadata flags that you place on individual objects. While an object has a hold placed on it, it cannot be deleted or replaced. You can, however, edit the metadata of an object that has a hold placed on it.
Object Lifecycle Management: You can assign a lifecycle management configuration to a bucket. The configuration contains a set of rules which apply to current and future objects in the bucket. You can use object lifecycle to set Time to Live (TTL) for objects, retaining noncurrent versions of objects, delete objects or downgrading storage classes of objects to help manage costs.
Cloud Storage Access Control: Access control lets you control who has access to your Cloud Storage buckets and objects and what level of access they have.
|Identity and Access Management||IAM allows you to control who has access to the resources in your Google Cloud project. Resources include Cloud Storage buckets and objects stored within buckets|
|Uniform bucket-level access (Formally know as bucket policy)||allows you to uniformly control access to your Cloud Storage resources. When you enable uniform bucket-level access on a bucket, Access Control Lists (ACLs) are disabled, and only bucket-level Identity and Access Management (IAM) permissions grant access to that bucket and the objects it contains.|
|Access control lists (ACLs)||a mechanism you can use to define who has access to your buckets and objects, as well as what level of access they have. In Cloud Storage, you apply ACLs to individual buckets and objects. Each ACL consists of one or more entries. An entry gives a specific user (or group) the ability to perform specific actions.|
|Public||make objects you own readable to everyone on the public internet|
Encryption: Cloud Storage always encrypts your data on the server side, before it is written to disk, at no additional charge. Besides this standard, Google-managed behavior, there are additional ways to encrypt your data when using Cloud Storage. Below is a summary of the encryption options available to you –
|server-side encryption (SSE) with Google-managed key||Enabled by default to encrypt all data in cloud storage, you don’t have any control over it.|
|server-side encryption (SSE) with customer-supplied encryption key||You can create and manage your own encryption keys. These keys act as an additional encryption layer on top of the standard Cloud Storage encryption.|
|server-side encryption (SSE) with customer-managed encryption key||You can manage your encryption keys, which are generated for you by Cloud Key Management Service. These keys act as an additional encryption layer on top of the standard Cloud Storage encryption.|
|client-side encryption (CSE)||encryption that occurs before data is sent to Cloud Storage. Such data arrives at Cloud Storage already encrypted but also undergoes server-side encryption.|
Pub/Sub notifications for Cloud Storage: Pub/Sub notifications sends information about changes to objects in your buckets to Pub/Sub, where the information is added to a Pub/Sub topic of your choice in the form of messages.
Cloud Audit Logs with Cloud Storage: Use Cloud Audit Logs to generate logs for API operations performed in Cloud Storage.
Usage logs & storage logs: Cloud Storage offers usage logs and storage logs in the form of CSV files that you can download and view. Usage logs provide information for all of the requests made on a specified bucket and are created hourly. Storage logs provide information about the storage consumption of that bucket for the last day and are created daily. Once set up, usage logs and storage logs are automatically created as new objects in a bucket that you specify.
Cross-origin resource sharing (CORS): The Cross Origin Resource Sharing (CORS) spec was developed by the World Wide Web Consortium (W3C) to get around the limitation of same origin policy. Cloud Storage supports this specification by allowing you to configure your buckets to support CORS.
Signed URLs: use to give time-limited resource access to anyone in possession of the URL, regardless of whether they have a Google account.