Skip to main content

Command Palette

Search for a command to run...

How to Tune Up Azure Cosmos DB for Performance and Cost Efficiency

Updated
4 min read
How to Tune Up Azure Cosmos DB for Performance and Cost Efficiency
P
Senior Software Engineer specialising in cloud architecture, distributed systems, and modern .NET development, with over two decades of experience designing and delivering enterprise platforms in financial, insurance, and high-scale commercial environments. My focus is on building systems that are reliable, scalable, and maintainable over the long term. I’ve led modernisation initiatives moving legacy platforms to cloud-native Azure architectures, designed high-throughput streaming solutions to eliminate performance bottlenecks, and implemented secure microservices environments using container-based deployment models and event-driven integration patterns. From an architecture perspective, I have strong practical experience applying approaches such as Vertical Slice Architecture, Domain-Driven Design, Clean Architecture, and Hexagonal Architecture. I’m particularly interested in modular system design that balances delivery speed with long-term sustainability, and I enjoy solving complex problems involving distributed workflows, performance optimisation, and system reliability. I enjoy mentoring engineers, contributing to architectural decisions, and helping teams simplify complex systems into clear, maintainable designs. I’m always open to connecting with other engineers, architects, and technology leaders working on modern cloud and distributed system challenges.

Azure Cosmos DB provides a powerful, fully managed NoSQL database solution, ideal for applications requiring high availability, dynamic scalability, and low latency data access. Despite these inherent advantages, ineffective configuration can introduce performance bottlenecks, inefficient resource usage, and unexpected costs. Optimising Cosmos DB demands a strategic approach focusing on partitioning, RU management, indexing strategies, query performance, and continuous monitoring.

Choosing an Effective Partitioning Strategy

Proper partitioning is essential to ensuring Cosmos DB operates efficiently. Cosmos DB horizontally scales by distributing data across logical partitions determined by a partition key. Selecting an unsuitable partition key may cause uneven data distribution, leading to hotspots, throttling, and slow performance. Ideally, your partition key should evenly distribute both data and query loads.

For instance, consider choosing a partition key like this:

{
    "customerId": "12345",
    "orderId": "67890",
    "partitionKey": "12345-67890"
}

If your data naturally doesn't provide an evenly distributed key, synthetic keys, combinations like "customerId-orderId", can ensure uniform data spread.

Regularly check your partition usage in the Azure Portal to identify hotspots early and rebalance accordingly.

Optimising and Managing Request Units (RUs)

Request Units (RUs) dictate the cost and throughput performance of Cosmos DB operations. Efficient RU management involves both automatic scaling and code level optimisations.

Enabling Auto Scale dynamically adjusts RU provisioning based on actual usage:

az cosmosdb sql container throughput migrate --account-name MyCosmosAccount \
  --database-name MyDatabase \
  --name MyContainer \
  --resource-group MyResourceGroup \
  --throughput-type autoscale

In code, leverage bulk operations via the .NET SDK to lower RU consumption significantly during batch inserts:

cosmosClient client = new CosmosClient(connectionString, new CosmosClientOptions { AllowBulkExecution = true });

Container container = client.GetContainer("MyDatabase", "MyContainer");

List<Task> tasks = new();

foreach (var item in itemsToInsert)
{
    tasks.Add(container.CreateItemAsync(item, new PartitionKey(item.partitionKey)));
}

await Task.WhenAll(tasks);

For high RU query scenarios, consider isolating such queries into dedicated containers to control cost and throughput more effectively.

Refining Indexing Policies for Performance and Cost Savings

By default, Cosmos DB indexes all document properties, which simplifies initial setup but may be inefficient at scale. Custom indexing policies are vital to reducing overhead.

Here's a sample indexing policy that explicitly indexes only required fields and excludes unnecessary paths:

{
    "indexingMode": "consistent",
    "includedPaths": [
        {
            "path": "/customerId/?"
        },
        {
            "path": "/orderDate/?"
        }
    ],
    "excludedPaths": [
        {
            "path": "/*"
        }
    ]
}

For write intensive scenarios, consider using lazy indexing to minimise immediate overhead:

{
    "indexingMode": "lazy"
}

Always validate your index utilisation via Cosmos DB metrics in Azure Portal, adjusting policies based on real world usage patterns.

Improving Query Performance Through Optimisation

Efficient queries dramatically lower RU usage and enhance response speed. Always request only the necessary fields explicitly:

Inefficient approach:

SELECT * FROM c WHERE c.customerId = '12345'

Optimised approach:

SELECT VALUE c.orderId FROM c WHERE c.customerId = '12345'

Avoid cross partition queries by including partition keys in queries explicitly, significantly reducing RU consumption:

SELECT * FROM c WHERE c.customerId = '12345' AND c.partitionKey = '12345-67890'

When sorting results, always use indexed fields in ORDER BY clauses:

SELECT * FROM c WHERE c.customerId = '12345' ORDER BY c.orderDate DESC

For complex logic or multiple document operations, use server side stored procedures:

function createOrders(items) {
    var context = getContext();
    var collection = context.getCollection();

    items.forEach(function(item) {
        var accepted = collection.createDocument(collection.getSelfLink(), item, function(err) {
            if (err) throw new Error('Error creating document: ' + err.message);
        });

        if (!accepted) {
            throw new Error('Request was not accepted by server.');
        }
    });
}

Leveraging TTL (Time to Live) and Multi Region Writes

Cosmos DB's TTL feature automatically deletes expired documents, streamlining storage management:

Example configuration setting TTL to 30 days:

{
    "defaultTtl": 2592000  // 30 days in seconds
}

For global workloads, enable multi region writes via the Azure CLI or Portal, improving latency and availability across regions:

az cosmosdb update --name MyCosmosAccount \
  --resource-group MyResourceGroup \
  --locations regionName=eastus failoverPriority=0 isZoneRedundant=False \
  --locations regionName=westeurope failoverPriority=1 isZoneRedundant=False \
  --enable-multiple-write-locations true

Continuous Monitoring and Analysis

Consistent monitoring is key to maintaining performance and efficiency. Azure Monitor and Application Insights offer solid insights into RU usage, hot partitions, and query bottlenecks.

Sample Kusto query in Azure Monitor to identify high consumption operations:

AzureDiagnostics
| where Category == "DataPlaneRequests"
| summarize avg(RequestCharge_s) by OperationName, DatabaseName, CollectionName
| order by avg_RequestCharge_s desc

And lastly, regularly check latency, availability, and consistency metrics to adjust configurations proactively, avoiding performance degradation.