How to Tune Up Azure Cosmos DB for Performance and Cost Efficiency

Azure Cosmos DB provides a powerful, fully managed NoSQL database solution, ideal for applications requiring high availability, dynamic scalability, and low latency data access. Despite these inherent advantages, ineffective configuration can introduce performance bottlenecks, inefficient resource usage, and unexpected costs. Optimising Cosmos DB demands a strategic approach focusing on partitioning, RU management, indexing strategies, query performance, and continuous monitoring.

Choosing an Effective Partitioning Strategy

Proper partitioning is essential to ensuring Cosmos DB operates efficiently. Cosmos DB horizontally scales by distributing data across logical partitions determined by a partition key. Selecting an unsuitable partition key may cause uneven data distribution, leading to hotspots, throttling, and slow performance. Ideally, your partition key should evenly distribute both data and query loads.

For instance, consider choosing a partition key like this:

{
    "customerId": "12345",
    "orderId": "67890",
    "partitionKey": "12345-67890"
}

If your data naturally doesn't provide an evenly distributed key, synthetic keys, combinations like "customerId-orderId", can ensure uniform data spread.

Regularly check your partition usage in the Azure Portal to identify hotspots early and rebalance accordingly.

Optimising and Managing Request Units (RUs)

Request Units (RUs) dictate the cost and throughput performance of Cosmos DB operations. Efficient RU management involves both automatic scaling and code level optimisations.

Enabling Auto Scale dynamically adjusts RU provisioning based on actual usage:

az cosmosdb sql container throughput migrate --account-name MyCosmosAccount \
  --database-name MyDatabase \
  --name MyContainer \
  --resource-group MyResourceGroup \
  --throughput-type autoscale

In code, leverage bulk operations via the .NET SDK to lower RU consumption significantly during batch inserts:

cosmosClient client = new CosmosClient(connectionString, new CosmosClientOptions { AllowBulkExecution = true });

Container container = client.GetContainer("MyDatabase", "MyContainer");

List<Task> tasks = new();

foreach (var item in itemsToInsert)
{
    tasks.Add(container.CreateItemAsync(item, new PartitionKey(item.partitionKey)));
}

await Task.WhenAll(tasks);

For high RU query scenarios, consider isolating such queries into dedicated containers to control cost and throughput more effectively.

Refining Indexing Policies for Performance and Cost Savings

By default, Cosmos DB indexes all document properties, which simplifies initial setup but may be inefficient at scale. Custom indexing policies are vital to reducing overhead.

Here's a sample indexing policy that explicitly indexes only required fields and excludes unnecessary paths:

{
    "indexingMode": "consistent",
    "includedPaths": [
        {
            "path": "/customerId/?"
        },
        {
            "path": "/orderDate/?"
        }
    ],
    "excludedPaths": [
        {
            "path": "/*"
        }
    ]
}

For write intensive scenarios, consider using lazy indexing to minimise immediate overhead:

{
    "indexingMode": "lazy"
}

Always validate your index utilisation via Cosmos DB metrics in Azure Portal, adjusting policies based on real world usage patterns.

Improving Query Performance Through Optimisation

Efficient queries dramatically lower RU usage and enhance response speed. Always request only the necessary fields explicitly:

Inefficient approach:

SELECT * FROM c WHERE c.customerId = '12345'

Optimised approach:

SELECT VALUE c.orderId FROM c WHERE c.customerId = '12345'

Avoid cross partition queries by including partition keys in queries explicitly, significantly reducing RU consumption:

SELECT * FROM c WHERE c.customerId = '12345' AND c.partitionKey = '12345-67890'

When sorting results, always use indexed fields in ORDER BY clauses:

SELECT * FROM c WHERE c.customerId = '12345' ORDER BY c.orderDate DESC

For complex logic or multiple document operations, use server side stored procedures:

function createOrders(items) {
    var context = getContext();
    var collection = context.getCollection();

    items.forEach(function(item) {
        var accepted = collection.createDocument(collection.getSelfLink(), item, function(err) {
            if (err) throw new Error('Error creating document: ' + err.message);
        });

        if (!accepted) {
            throw new Error('Request was not accepted by server.');
        }
    });
}

Leveraging TTL (Time to Live) and Multi Region Writes

Cosmos DB's TTL feature automatically deletes expired documents, streamlining storage management:

Example configuration setting TTL to 30 days:

{
    "defaultTtl": 2592000  // 30 days in seconds
}

For global workloads, enable multi region writes via the Azure CLI or Portal, improving latency and availability across regions:

az cosmosdb update --name MyCosmosAccount \
  --resource-group MyResourceGroup \
  --locations regionName=eastus failoverPriority=0 isZoneRedundant=False \
  --locations regionName=westeurope failoverPriority=1 isZoneRedundant=False \
  --enable-multiple-write-locations true

Continuous Monitoring and Analysis

Consistent monitoring is key to maintaining performance and efficiency. Azure Monitor and Application Insights offer solid insights into RU usage, hot partitions, and query bottlenecks.

Sample Kusto query in Azure Monitor to identify high consumption operations:

AzureDiagnostics
| where Category == "DataPlaneRequests"
| summarize avg(RequestCharge_s) by OperationName, DatabaseName, CollectionName
| order by avg_RequestCharge_s desc

And lastly, regularly check latency, availability, and consistency metrics to adjust configurations proactively, avoiding performance degradation.