Getting started with Azure Cosmos DB
Table of contents:
- What is Azure Cosmos DB?
- Key benefits of Azure Cosmos DB
- Key concepts and features of Azure Cosmos DB
- What is Azure Cosmos DB API
- Azure Cosmos DB pricing
- Azure Cosmos DB Limitations and Quotas
- Azure Cosmos DB security options
- Azure Cosmos DB monitoring and logging options
- How to create Azure Cosmos DB: Using Azure Portal
- How to create Azure Cosmos DB: Using Azure CLI
- Summary
What is Azure Cosmos DB?
Azure Cosmos DB is a cloud-based database service that offers high performance, scalability, and flexibility for modern applications. It is a multi-model database that supports various data models, such as key-value, document, graph, and columnar. It also provides multiple APIs to access the data, such as SQL, MongoDB, Cassandra, Gremlin, and Table. Azure Cosmos DB is globally distributed, meaning that it can replicate the data across multiple regions and ensure low latency and high availability. It also allows users to choose from five consistency levels, ranging from strong to eventual, depending on their data freshness and availability requirements. Azure Cosmos DB is designed for mission-critical applications that need to handle large volumes of data and complex queries. It also offers features such as automatic indexing, partitioning, backup, encryption, and serverless computing. Azure Cosmos DB is a powerful and versatile database service that can meet the needs of any application. 🚀
Key benefits of Azure Cosmos DB:
- Guaranteed speed at any scale with instant and limitless elasticity, fast reads, and multi-region writes anywhere in the world.
- Fast, flexible app development with free dev/test options, multiple SDKs, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra.
- Mission-critical ready with 99.999 percent availability, continuous backup, and enterprise-grade security.
- Pay for only what you use with a cost-effective, responsive, and fully managed serverless database that scales elastically with your app.
Key concepts and features of Azure Cosmos DB:
Some of the key concepts and features of Azure Cosmos DB are:
- Account: The top-level entity that represents a set of configurations and databases. An account can have multiple databases and can be replicated across multiple regions. An account can also have different consistency levels, backup policies, firewall rules, and private endpoints.
- Database: A logical container that groups related containers and provides access control and throughput allocation. A database can have one or more containers that share the same provisioned throughput or have their own dedicated throughput.
- Container: A schema-agnostic container of items, stored procedures, user-defined functions (UDFs), and triggers. A container can have different API endpoints, such as SQL, MongoDB, Cassandra, Gremlin, or Table. A container can also have a partition key that defines how the data is distributed across physical partitions.
- Item: An entity that stores data in a container. An item can have any structure and any number of properties. An item can be accessed by its unique ID and partition key value.
- Partition key: A property or a combination of properties that determines how the data is partitioned and distributed across physical partitions. A partition key enables horizontal scaling and ensures efficient data access.
- Physical partition: A fixed amount of storage and compute resources that hosts one or more logical partitions. A physical partition can have up to 10 GB of storage and 10,000 request units per second (RU/s) of throughput.
- Logical partition: A subset of items that share the same partition key value. A logical partition can have up to 20 GB of storage and unlimited throughput within the limits of the physical partition.
- Request unit (RU): A measure of the resources consumed by a database operation, such as read, write, query, or execute stored procedure. The RU charge depends on the size and complexity of the operation and the consistency level of the account.
- Throughput: The amount of resources available for a database or a container to perform operations. Throughput is expressed in RU/s and can be provisioned or serverless. Provisioned throughput guarantees a certain amount of RU/s for a database or a container at all times. Serverless throughput charges only for the RU/s consumed by the operations.
- Consistency level: The degree of data freshness and availability across regions for a database account. Azure Cosmos DB offers five consistency levels: strong, bounded staleness, session, consistent prefix, and eventual.
These are some of the main concepts and features of Azure Cosmos DB.
What is Azure Cosmos DB API?
An Azure Cosmos DB API is a way of accessing and manipulating data stored in Azure Cosmos DB, a cloud-based database service that offers high performance, scalability, and flexibility for modern applications. Azure Cosmos DB supports various data models, such as key-value, document, graph, and columnar, and provides multiple APIs to access the data, such as SQL, MongoDB, Cassandra, Gremlin, and Table. These APIs allow your applications to treat Azure Cosmos DB as if it were various other databases technologies, without the overhead of management, and scaling approaches.
The types of APIs available in Azure Cosmos DB are:
- Core (SQL) API: Provides the flexibility of a NoSQL document store combined with the power of SQL for querying. It supports rich and complex queries, including joins, aggregates, subqueries, and user-defined functions. It also supports transactions, stored procedures, triggers, and change feed.
- MongoDB API: Supports the MongoDB wire protocol so that existing MongoDB clients continue to work with Azure Cosmos DB as if they are running against an actual MongoDB database. It supports most of the features of MongoDB 3.6 and 4.0 versions, such as CRUD operations, indexes, aggregation pipeline, transactions, change stream, and MongoDB Compass.
- Cassandra API: Supports the Cassandra Query Language (CQL) so that existing Cassandra clients continue to work with Azure Cosmos DB as if they are running against an actual Cassandra database. It supports most of the features of Cassandra 3.x versions, such as CRUD operations, indexes, collections, user-defined types (UDTs), lightweight transactions (LWTs), batches, and cqlsh.
- Gremlin API: Supports the Apache TinkerPop graph traversal language so that existing Gremlin clients continue to work with Azure Cosmos DB as if they are running against an actual graph database. It supports most of the features of Gremlin 3.4 version, such as CRUD operations
Azure Cosmos DB pricing:
Azure Cosmos DB pricing is based on the following factors:
- Database operations: The cost of all database operations is normalized and expressed as either request units (RU) or vCore (compute and memory). Azure Cosmos DB offers three database operations models: Provisioned Throughput, Serverless, and vCore. Provisioned Throughput offers pre-selected database operations capacity, measured in request units per second (RU/s) and billed per hour across all selected Azure regions enabled on the account. Serverless offers on-demand database operations, and bills for the request units (RU) used for each database operation. vCore offers dedicated compute and memory resources for PostgreSQL and MongoDB workloads, measured in vCores and billed per hour per node.
- Consumed storage: The cost of consumed storage is based on the amount of data stored in Azure Cosmos DB containers, measured in gigabytes (GB) and billed per hour across all selected Azure regions enabled on the account. Consumed storage includes the size of your items, their indexing overhead, and any snapshots or backups.
- Optional dedicated gateways: The cost of optional dedicated gateways is based on the number of gateways provisioned, their size, and their availability zone configuration. Dedicated gateways provide enhanced performance, security, and isolation for your Azure Cosmos DB account. They are billed per hour per gateway.
You can estimate your Azure Cosmos DB pricing by using the Azure pricing calculator or the Azure Cosmos DB capacity planner.
Azure Cosmos DB Limitations and Quotas:
Azure Cosmos DB has some limitations and quotas that apply to different resources and operations. Some of these limitations and quotas are adjustable, while others are fixed. Here are some of the common limitations and quotas for Azure Cosmos DB:
- Storage and database operations: The maximum storage per container is unlimited, but the maximum storage per logical partition is 20 GB. The maximum throughput per container or database is 1,000,000 RU/s, and the maximum throughput per physical or logical partition is 10,000 RU/s. The minimum throughput per container or database is 400 RU/s, and the minimum throughput per 1 GB of storage is 1 RU/s.
- APIs: Azure Cosmos DB supports various APIs, such as SQL, MongoDB, Cassandra, Gremlin, and Table. Each API has its own features and limitations, such as supported versions, data types, operators, transactions, and compatibility. For example, the MongoDB API supports most of the features of MongoDB 3.6 and 4.0 versions, but not 4.2 or later versions. The Cassandra API supports most of the features of Cassandra 3.x versions, but not 4.x or later versions.
- Management groups: The maximum number of management groups per Azure AD tenant is 10,000. The maximum number of subscriptions per management group is unlimited. The maximum number of levels of management group hierarchy is root level plus 6 levels.
These are some of the main limitations and quotas for Azure Cosmos DB.
Azure Cosmos DB security options:
Azure Cosmos DB is a fully managed, globally distributed, multi-model database service that offers high availability, scalability, and performance. It also provides various features and options to secure your data and access to your database account. Here are some of the main aspects of Azure Cosmos DB security:
- Encryption at rest: Azure Cosmos DB automatically encrypts your data at rest with service-managed keys. You can also opt for customer-managed keys (CMK) to add a second layer of encryption with keys that you control and manage. Encryption at rest protects your data from unauthorized access or theft in case of a physical breach or compromise of the storage layer.
- IP firewall: Azure Cosmos DB supports IP firewall protection to restrict access to your database account based on the source IP address or range. You can configure firewall rules to allow only specific IP addresses or virtual networks to access your database account. This helps prevent unauthorized or malicious requests from reaching your database endpoint.
- Azure Private Link: Azure Private Link allows you to securely connect your Azure Cosmos DB account to your virtual network using a private endpoint. This enables you to access your database account over a private connection without exposing it to the public internet. Azure Private Link also helps you meet compliance and regulatory requirements by ensuring that your data stays within your network boundary.
- Authentication and authorization: Azure Cosmos DB uses two types of keys to authenticate requests to your database account: primary keys and secondary keys. Primary keys provide full administrative access to your database account, while secondary keys provide read-only access. You can also use resource tokens to grant granular and time-bound permissions to specific resources within your database account, such as containers, items, or stored procedures. Resource tokens are ideal for scenarios where you need to delegate access to clients or users without exposing your master keys.
- Data governance: Azure Cosmos DB allows you to geo-fence your data by choosing the regions where you want to store and replicate your data. You can also configure geo-redundancy and failover policies to ensure high availability and disaster recovery for your data across regions. Azure Cosmos DB also complies with various industry standards and certifications, such as ISO, SOC, PCI DSS, HIPAA, and GDPR, to help you meet your data security and privacy requirements.
Azure Cosmos DB monitoring and logging options:
Azure Cosmos DB is a fully managed, globally distributed, multi-model database service that offers high availability, scalability, and performance. It also provides various features and options to monitor and log your data and access to your database account. Here are some of the main aspects of Azure Cosmos DB monitoring and logging:
- Azure Monitor: Azure Monitor is a service that collects, analyzes, and acts on telemetry data from your Azure resources. You can use Azure Monitor to track the health, performance, and usage of your Azure Cosmos DB account. You can also create alerts, dashboards, and reports based on the metrics and logs collected by Azure Monitor.
- Metrics: Metrics are numerical values that represent the state or activity of your Azure Cosmos DB account over time. You can use metrics to monitor various aspects of your database account, such as throughput, storage, availability, latency, consistency, and system level metrics. You can view metrics in the Azure portal, Azure Monitor, or programmatically using SDKs or REST API. You can also configure metric alerts to notify you when certain conditions are met or thresholds are crossed.
- Logs: Logs are records of events and traces that occur in your Azure Cosmos DB account. Logs provide rich and detailed information about the operation of your database account, such as data plane requests, query runtime statistics, control plane operations, and configuration changes. You can use logs to troubleshoot issues, optimize performance, audit activities, and analyze trends. You can view logs in the Azure portal, Azure Monitor, or programmatically using SDKs or REST API. You can also configure log alerts to notify you when certain patterns or anomalies are detected in your log data.
- Diagnostic settings: Diagnostic settings are used to collect resource logs from your Azure Cosmos DB account and send them to one or more destinations for analysis and storage. You can send your logs to Log Analytics workspaces, Event Hubs, Storage Accounts, or Azure Monitor metrics. You can also choose which categories of logs you want to collect and enable full-text query logging for query text analysis.
How to create Azure Cosmos DB?
Using Azure Portal:
Prerequisites
Before you start, you need to have an Azure subscription or a free Azure Cosmos DB trial account. If you don’t have an Azure subscription, you can create an Azure free account before you begin. You can also try Azure Cosmos DB for free, without an Azure subscription, and with no commitment required. Alternatively, you can create an Azure Cosmos DB free tier account, with the first 1000 RU/s and 25 GB of storage for free.
Create an Azure Cosmos DB account
The first step is to create an Azure Cosmos DB account. An account is the top-level entity that represents a set of configurations and databases. An account can have multiple databases and can be replicated across multiple regions. An account can also have different consistency levels, backup policies, firewall rules, and private endpoints.
To create an Azure Cosmos DB account, follow these steps:
- Log in to the Azure portal.
- Click on the Show Portal Menu option from the top left and select + Create a Resource option.
- Search for Azure Cosmos DB and then click on the search result Azure Cosmos DB.
- On the Create an Azure Cosmos DB account page, select the Create option within the Azure Cosmos DB for NoSQL section.
- In the Create Azure Cosmos DB Account page, enter the basic settings for the new Azure Cosmos DB account.
- Subscription: Select the Azure subscription that you want to use for this Azure Cosmos DB account.
- Resource Group: Select a resource group or select Create new and enter a unique name for the new resource group.
- Account Name: Enter a name to identify your Azure Cosmos DB account. The name can contain only lowercase letters, numbers, and the hyphen (-) character. It must be 3-44 characters long.
- Location: Select a geographic location to host your Azure Cosmos DB account. Use the location that is closest to your users to give them the fastest access to the data.
- Capacity mode: Select Provisioned throughput or Serverless depending on your database operations needs. Provisioned throughput offers pre-selected database operations capacity measured in request units per second (RU/s) and billed per hour across all selected regions enabled on the account. Serverless offers on-demand database operations and bills for the request units (RU) used for each database operation.
- Click Next: Networking > and configure the network settings for your account.
- Connectivity method: Select Public endpoint (all networks) or Public endpoint (selected networks) depending on your network security requirements. Public endpoint (all networks) allows access from any network including internet clients using firewall rules. Public endpoint (selected networks) restricts access to only specific virtual networks or IP ranges using firewall rules.
- Firewall rules: If you selected Public endpoint (selected networks), you need to add virtual networks or IP ranges that are allowed to access your account. You can also enable access from Azure portal by checking Allow access from Azure portal option.
- Private endpoint: If you want to connect securely and privately from your virtual network to your account without exposing it over internet, you can enable private endpoint connections by clicking Add new under Private endpoint connections section.
- Click Next: Tags > and optionally add tags to your account. Tags are name/value pairs that enable you to categorize resources and view consolidated billing by applying the same tag to multiple resources and resource groups.
- Click Next: Review + create > and review the summary of your account settings. Click Create to create your account.
It may take a few minutes for your account to be created. You can monitor the progress on the Notifications page.
Create a database and a container
After creating your account, you need to create a database and a container within that database. A database is a logical container that groups related containers and provides access control and throughput allocation. A container is a schema-agnostic container of items, stored procedures, user-defined functions (UDFs), and triggers. A container can have different API endpoints, such as SQL, MongoDB, Cassandra, Gremlin, or Table. A container can also have a partition key that defines how the data is distributed across physical partitions.
To create a database and a container, follow these steps:
- In the Azure portal, go to your Azure Cosmos DB account and click Data Explorer on the left menu.
- Click New Container on the top menu.
- In the Add Container page, enter the settings for your new container.
- Database id: Enter a name for your new database or select an existing database from the drop-down list.
- Throughput: Enter the amount of throughput you want to allocate for your database or container in RU/s. You can also enable autoscale to automatically adjust the throughput based on the usage patterns.
- Container id: Enter a name for your new container.
- Partition key: Enter a property name that will be used as the partition key for your container. The partition key determines how the data is distributed across physical partitions and enables horizontal scaling and efficient data access.
- Click OK to create your database and container.
You can now see your new database and container in the Data Explorer. You can also view and modify the properties of your database and container by clicking on the Settings tab.
Add data to your container
Now that you have created your database and container, you can add data to your container. You can use the Data Explorer to create, read, update, and delete items in your container. You can also use the Query Explorer to run SQL queries against your container.
To add data to your container, follow these steps:
- In the Data Explorer, expand your database and select your container. Click Items on the left menu.
- Click New Item on the top menu.
- In the Add Item page, enter an item in JSON format. Make sure to include the partition key property that you specified when creating your container. For example, if you used /id as the partition key, you can enter an item like this:{ “id”: “1”, “name”: “Alice”, “age”: 25, “city”: “Seattle” }
- Click Save to add your item to your container.
You can now see your item in the Items list. You can also edit or delete your item by clicking on the ellipsis (…) next to it.
Using Azure CLI:
Prerequisites
Before you start, you need to have an Azure subscription or a free Azure Cosmos DB trial account. If you don’t have an Azure subscription, you can create an Azure free account before you begin. You can also try Azure Cosmos DB for free, without an Azure subscription, and with no commitment required. Alternatively, you can create an Azure Cosmos DB free tier account, with the first 1000 RU/s and 25 GB of storage for free.
You also need to have the Azure CLI installed on your machine. The Azure CLI is a set of commands used to create and manage Azure resources. It’s available across many Azure services, including Azure Cosmos DB, and gives you the ability to manage Cosmos DB services from a command-line. You can install the Azure CLI from here.
Install the Azure CLI on your machine.
Sign in to Azure with the az login
command. You will be prompted to open a browser and enter a code to authenticate.
Create a resource group with the az group create
command. A resource group is a logical container for grouping your Azure services. For example:
1 |
az group create --name myResourceGroup --location eastus |
Create an Azure Cosmos DB account with the az cosmosdb create
command. You need to specify the resource group name, the account name, and the API you want to use. For example:
1 |
az cosmosdb create --resource-group myResourceGroup --name my-cosmos-db-account --api mongodb |
Get the connection string for your Azure Cosmos DB account with the az cosmosdb keys list
command. You need to specify the resource group name and the account name. For example:
1 |
az cosmosdb keys list --resource-group myResourceGroup --name my-cosmos-db-account --type connection-strings |
Create a database. A database is a logical unit that contains one or more containers. You can create a database using the az cosmosdb sql database create
command. For example, the following command creates a database named my-database
in the account my-cosmos-db-account
:
1 |
az cosmosdb sql database create --name my-database --account-name my-cosmos-db-account --resource-group myResourceGroup |
Create a container. A container is a collection of items that share the same schema and partition key. You can create a container using the az cosmosdb sql container create
command. For example, the following command creates a container named my-container
in the database my-database
with a partition key of /id
:
1 |
az cosmosdb sql container create --name my-container --database-name my-database --account-name my-cosmos-db-account --resource-group myResourceGroup --partition-key-path /id |
Create some items. An item is a JSON document that represents an entity in your application. You can create an item using the az cosmosdb sql document create
command. For example, the following command creates an item with an id of 1
and some properties in the container my-container
:
1 |
az cosmosdb sql document create --id 1 --partition-key 1 --content '{"name": "Alice", "age": 25, "city": "New York"}' --container-name my-container --database-name my-database --account-name my-cosmos-db-account --resource-group myResourceGroup |
Summary:
Azure Cosmos DB is a fully managed, globally distributed, multi-model database service that offers high availability, scalability, and performance for modern app development. It supports various database APIs, such as NoSQL, MongoDB, PostgreSQL, Apache Cassandra, and Apache Gremlin. It also provides features such as encryption at rest, IP firewall, Azure Private Link, authentication and authorization, data governance, Azure Monitor, metrics, logs, diagnostic settings, and Azure Synapse Link. Here is a possible summary based on the information I found:
Azure Cosmos DB is a cloud-based database service that enables you to build fast and scalable applications with low latency and high availability. You can choose from multiple database models and APIs to suit your needs, such as NoSQL for schemaless data, MongoDB for document data, PostgreSQL for relational data, Apache Cassandra for wide-column data, and Apache Gremlin for graph data. You can also geo-fence your data by selecting the regions where you want to store and replicate your data. Azure Cosmos DB guarantees speed at any scale with SLA-backed single-digit millisecond reads and writes and 99.999 percent availability for NoSQL data. You can also scale your storage and throughput elastically across any Azure region with automatic or serverless options. Azure Cosmos DB also provides various security and monitoring features to protect and analyze your data. You can encrypt your data at rest with service-managed or customer-managed keys, restrict access to your database account based on IP address or virtual network, connect your database account to your virtual network using a private endpoint, authenticate requests to your database account using primary or secondary keys or resource tokens, and comply with various industry standards and certifications. You can also monitor the health, performance, and usage of your database account using Azure Monitor, metrics, logs, diagnostic settings, and alerts. You can also run near real-time analytics and AI on your operational data using Azure Synapse Link without moving data or affecting the performance of your database account.