What is Azure Cosmos DB?
Hello Everyone
Welcome to CloudAffaire and this is Debjeet.
Today we are going to discuss the Azure Cosmos DB solution. We will learn what is Azure Cosmos DB, Azure Cosmos DB API types, Azure Cosmos DB resource model, Azure Cosmos DB Global Distribution, Azure Cosmos DB Consistency levels, Azure Cosmos DB Partitioning, Azure Cosmos DB Capacity, and Azure Cosmos DB Backups and Restores.
What is Azure Cosmos DB?
Azure Cosmos DB is a fully managed NoSQL database service with single-digit millisecond response times and 99.999-percent availability, backed by SLAs, automatic and instant scalability, and open-source APIs for MongoDB and Cassandra. Azure Cosmos DB provides fast writes and reads anywhere in the world with turnkey data replication and multi-region writes. You can also gain insight over real-time data with no-ETL analytics using Azure Synapse Link for Azure Cosmos DB.
Azure Cosmos DB API Types:
Azure Cosmos DB offers multiple database APIs, which include the Core (SQL) API, API for MongoDB, Cassandra API, Gremlin API, and Table API. By using these APIs, you can model real-world data using documents, key-value, graph, and column-family data models. These APIs allow your applications to treat Azure Cosmos DB as if it were various other databases technologies, without the overhead of management, and scaling approaches.
Core(SQL) API:
This API stores data in document format. It offers the best end-to-end experience as we have full control over the interface, service, and the SDK client libraries. Any new feature that is rolled out to Azure Cosmos DB is first available on SQL API accounts. Azure Cosmos DB SQL API accounts provide support for querying items using the Structured Query Language (SQL) syntax
API for MongoDB:
This API stores data in a document structure, via BSON format. It is compatible with MongoDB wire protocol; however, it does not use any native MongoDB related code. This API is a great choice if you want to use the broader MongoDB ecosystem and skills, without compromising on using Azure Cosmos DB features such as scaling, high availability, geo-replication, multiple write locations, automatic and transparent shard management, transparent replication between operational and analytical stores, and more.
Cassandra API:
This API stores data in column-oriented schema. Apache Cassandra offers a highly distributed, horizontally scaling approach to storing large volumes of data while offering a flexible approach to a column-oriented schema. Cassandra API in Azure Cosmos DB aligns with this philosophy to approaching distributed NoSQL databases. Cassandra API is wire protocol compatible with the Apache Cassandra. You should consider Cassandra API if you want to benefit the elasticity and fully managed nature of Azure Cosmos DB and still use most of the native Apache Cassandra features, tools, and ecosystem. This means on Cassandra API you don’t need to manage the OS, Java VM, garbage collector, read/write performance, nodes, clusters, etc.
Gremlin API:
This API allows users to make graph queries and stores data as edges and vertices. Use this API for scenarios involving dynamic data, data with complex relations, data that is too complex to be modeled with relational databases, and if you want to use the existing Gremlin ecosystem and skills. The Azure Cosmos DB Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure. It provides a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches. Gremlin API currently only supports OLTP scenarios.
Table API:
This API stores data in key/value format. If you are currently using Azure Table storage, you may see some limitations in latency, scaling, throughput, global distribution, index management, low query performance. Table API overcomes these limitations and it’s recommended to migrate your app if you want to use the benefits of Azure Cosmos DB. Table API only supports OLTP scenarios.
Azure Cosmos DB resource model:
To begin using Azure Cosmos DB, you should initially create an Azure Cosmos account in your Azure resource group in the required subscription, and then databases, containers, items under it.
Azure Cosmos account:
The Azure Cosmos account is the fundamental unit of global distribution and high availability. Your Azure Cosmos account contains a unique DNS name and you can manage an account by using the Azure portal or the Azure CLI, or by using different language-specific SDKs. For globally distributing your data and throughput across multiple Azure regions, you can add and remove Azure regions to your account at any time.
Azure Cosmos databases:
You can create one or multiple Azure Cosmos databases under your account. A database is analogous to a namespace. A database is the unit of management for a set of Azure Cosmos containers.
Azure Cosmos containers:
An Azure Cosmos container is the unit of scalability both for provisioned throughput and storage. A container is horizontally partitioned and then replicated across multiple regions. The items that you add to the container are automatically grouped into logical partitions, which are distributed across physical partitions, based on the partition key. The throughput on a container is evenly distributed across the physical partitions.
Azure Cosmos items:
Depending on which API you use, an Azure Cosmos item can represent either a document in a collection, a row in a table, or a node or edge in a graph.
(Image source: https://docs.microsoft.com/en-us/azure/cosmos-db/media/account-databases-containers-items/cosmos-entities.png)
Azure Cosmos DB Global Distribution:
Azure Cosmos DB is a globally distributed database system that allows you to read and write data from the local replicas of your database. Azure Cosmos DB transparently replicates the data to all the regions associated with your Cosmos account. Azure Cosmos DB is a globally distributed database service that’s designed to provide low latency, elastic scalability of throughput, well-defined semantics for data consistency, and high availability. In short, if your application needs fast response time anywhere in the world, if it’s required to be always online, and needs unlimited and elastic scalability of throughput and storage, you should build your application on Azure Cosmos DB. You can configure your databases to be globally distributed and available in any of the Azure regions.
(Image Source: https://docs.microsoft.com/en-us/azure/cosmos-db/media/distribute-data-globally/deployment-topology.png)
Azure Cosmos DB Consistency levels:
Azure Cosmos DB offers five well-defined levels. From strongest to weakest, the levels are:
Strong consistency:
Strong consistency offers a linearizability guarantee. Linearizability refers to serving requests concurrently. The reads are guaranteed to return the most recent committed version of an item. A client never sees an uncommitted or partial write. Users are always guaranteed to read the latest committed write.
Bounded staleness consistency:
In bounded staleness consistency, the reads are guaranteed to honor the consistent-prefix guarantee. The reads might lag behind writes by at most “K” versions (that is, “updates”) of an item or by “T” time interval, whichever is reached first. In other words, when you choose bounded staleness, the “staleness” can be configured in two ways:
- The number of versions (K) of the item
- The time interval (T) reads might lag behind the writes
For a single region account, the minimum value of K and T is 10 write operations or 5 seconds. For multi-region accounts the minimum value of K and T is 100,000 write operations or 300 seconds.
Session consistency:
In session consistency, within a single client session reads are guaranteed to honor the consistent-prefix, monotonic reads, monotonic writes, read-your-writes, and write-follows-reads guarantees. This assumes a single “writer” session or sharing the session token for multiple writers.
Consistent prefix consistency:
In consistent prefix option, updates that are returned contain some prefix of all the updates, with no gaps. Consistent prefix consistency level guarantees that reads never see out-of-order writes. If writes were performed in the order A, B, C, then a client sees either A, A,B, or A,B,C, but never out-of-order permutations like A,C or B,A,C. Consistent Prefix provides write latencies, availability, and read throughput comparable to that of eventual consistency, but also provides the order guarantees that suit the needs of scenarios where order is important.
Eventual consistency:
In eventual consistency, there’s no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge. Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it had read before. Eventual consistency is ideal where the application does not require any ordering guarantees. Examples include count of Retweets, Likes, or non-threaded comments.
Azure Cosmos DB Partitioning:
Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the performance needs of your application. In partitioning, the items in a container are divided into distinct subsets called logical partitions. Logical partitions are formed based on the value of a partition key that is associated with each item in a container. All the items in a logical partition have the same partition key value.
Logical partitions:
A logical partition consists of a set of items that have the same partition key. For example, in a container that contains data about food nutrition, all items contain a foodGroup property. You can use foodGroup as the partition key for the container. Groups of items that have specific values for foodGroup, such as Beef Products, Baked Products, and Sausages and Luncheon Meats, form distinct logical partitions. A logical partition also defines the scope of database transactions. You can update items within a logical partition by using a transaction with snapshot isolation. When new items are added to a container, new logical partitions are transparently created by the system. You don’t have to worry about deleting a logical partition when the underlying data is deleted. There is no limit to the number of logical partitions in your container. Each logical partition can store up to 20GB of data.
Physical partitions:
A container is scaled by distributing data and throughput across physical partitions. Internally, one or more logical partitions are mapped to a single physical partition. Typically smaller containers have many logical partitions but they only require a single physical partition. Unlike logical partitions, physical partitions are an internal implementation of the system and they are entirely managed by Azure Cosmos DB. There is no limit to the total number of physical partitions in your container. As your provisioned throughput or data size grows, Azure Cosmos DB will automatically create new physical partitions by splitting existing ones.
Replica sets:
Each physical partition consists of a set of replicas, also referred to as a replica set. Each replica hosts an instance of the database engine. A replica set makes the data stored within the physical partition durable, highly available, and consistent. Each replica that makes up the physical partition inherits the partition’s storage quota. All replicas of a physical partition collectively support the throughput that’s allocated to the physical partition. Azure Cosmos DB automatically manages replica sets.
(Image Source: https://docs.microsoft.com/en-us/azure/cosmos-db/media/partitioning-overview/logical-partitions.png)
Azure Cosmos DB Capacity:
Azure Cosmos DB supports many APIs, such as SQL, MongoDB, Cassandra, Gremlin, and Table. Each API has its own set of database operations. These operations range from simple point reads and writes to complex queries. Each database operation consumes system resources based on the complexity of the operation.
The cost of all database operations is normalized by Azure Cosmos DB and is expressed by Request Units (or RUs, for short). Request unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB.
The cost to do a point read (fetching a single item by its ID and partition key value) for a 1-KB item is 1 Request Unit (or 1 RU). All other database operations are similarly assigned a cost using RUs. No matter which API you use to interact with your Azure Cosmos container, costs are always measured by RUs. Whether the database operation is a write, point read, or query, costs are always measured in RUs. The type of Azure Cosmos account you’re using determines the way consumed RUs get charged.
(Image Source: https://docs.microsoft.com/en-us/azure/cosmos-db/media/request-units/request-units.png)
There are three modes in which you can create an account.
Provisioned throughput mode:
In this mode, you provision the number of RUs for your application on a per-second basis in increments of 100 RUs per second. To scale the provisioned throughput for your application, you can increase or decrease the number of RUs at any time in increments or decrements of 100 RUs. You can make your changes either programmatically or by using the Azure portal. You are billed on an hourly basis for the number of RUs per second you have provisioned.
Serverless mode:
In this mode, you don’t have to provision any throughput when creating resources in your Azure Cosmos account. At the end of your billing period, you get billed for the number of Request Units that has been consumed by your database operations.
Autoscale mode:
In this mode, you can automatically and instantly scale the throughput (RU/s) of your database or container based on its usage, without impacting the availability, latency, throughput, or performance of the workload. This mode is well suited for mission-critical workloads that have variable or unpredictable traffic patterns, and require SLAs on high performance and scale.
Azure Cosmos DB Backups:
Azure Cosmos DB automatically takes backups of your data at regular intervals. The automatic backups are taken without affecting the performance or availability of the database operations. All the backups are stored separately in a storage service. The automatic backups are helpful in scenarios when you accidentally delete or update your Azure Cosmos account, database, or container and later require the data recovery. Azure Cosmos DB backups are encrypted with Microsoft managed service keys. These backups are transferred over a secure non-public network. Which means, backup data remains encrypted while transferred over the wire and at rest. Backups of an account in a given region are uploaded to storage accounts in the same region.
Continuous backup mode:
This mode allows you to do restore to any point of time within the last 30 days. You can choose this mode while creating the Azure Cosmos DB account.
Periodic backup mode:
This mode is the default backup mode for all existing accounts. In this mode, backup is taken at a periodic interval and the data is restored by creating a request with the support team. In this mode, you configure a backup interval and retention for your account. The maximum retention period extends to a month. The minimum backup interval can be one hour.
Hope you have enjoyed this article. To get more details in Azure Cosmos DB, please refer to the below documentation.
https://docs.microsoft.com/en-us/azure/cosmos-db/