How to DDoS your own .NET app

Most people think about DDoS as something hostile. Someone floods your public API, your infrastructure starts to bend, and the conversation moves towards rate limiting, WAF rules, autoscaling, caching, and traffic filtering. That version is real, but it is not the only version. A lot of production systems get overloaded by their own code.
No attacker. No botnet. No suspicious traffic pattern from the outside. Just a set of reasonable engineering decisions that combine badly under load. A retry policy that looked sensible in development. A background worker that pulls too quickly. A health check that calls real dependencies every few seconds. A startup routine that warms every cache across every pod at the same time. A request path that fans out to five downstream services and assumes they will all keep up. Individually, these choices can look fine. Together, they can create the same look as an attack.
The app becomes its own traffic multiplier
The simplest way to accidentally DDoS your own app is to make one request turn into many. A request comes into your API. The endpoint calls a profile service, a pricing service, a permissions service, a feature flag service, and a database. Each one is fast enough during normal traffic, so nobody worries too much. Then traffic doubles. The incoming traffic doubles, but the outbound traffic does not just feel like it doubles. Every request now carries a fan-out cost. If one downstream dependency starts to slow down, the request duration increases. Longer requests stay in flight for longer. More work piles up. Thread pool pressure increases. Connection pools stay busy. Retries start. The system begins producing extra load while already struggling with the original load. This is how self-inflicted overload often starts. The problem is rarely one bad line of code. It is usually a multiplier hidden inside a perfectly normal request path.
A common version:
app.MapGet("/dashboard/{userId:guid}", async (
Guid userId,
IUserClient users,
IOrdersClient orders,
IFeatureClient features,
IRecommendationClient recommendations,
CancellationToken stopToken) =>
{
var userTask = users.GetUser(userId, stopToken);
var ordersTask = orders.GetRecentOrders(userId, stopToken);
var featuresTask = features.GetEnabledFeatures(userId, stopToken);
var recommendationsTask = recommendations.GetRecommendations(userId, stopToken);
await Task.WhenAll(userTask, ordersTask, featuresTask, recommendationsTask);
return Results.Ok(new DashboardResponse(
await userTask,
await ordersTask,
await featuresTask,
await recommendationsTask));
});
At a glance, this looks good. The calls are independent. Task.WhenAll reduces latency. The endpoint avoids blocking. The hidden question is what happens at scale. One thousand incoming requests are now four thousand outbound calls. If each outbound call has retries, the real number can be much higher. If every instance does the same thing at the same time, the downstream services feel the multiplied traffic before your own API does. Parallelism is useful, but unbounded parallelism is one of the easiest ways to turn normal load into a traffic storm.
Retry policies can make an outage worse
Retries are one of those things that feel responsible. A transient error happens. You retry. The user never sees the failure. The system becomes more resilient. Thats the happy path. The failure path is more interesting. Imagine a downstream API is slowing down because it is overloaded. Your .NET service receives a timeout. Polly retries. Other requests do the same thing. Every app instance sends more calls to a dependency that is already unable to deal with the original call volume.
The retry policy was added to improve reliability. Under pressure, it increases traffic.
This is the kind of code that can cause pain:
builder.Services
.AddHttpClient<IPaymentClient, PaymentClient>()
.AddStandardResilienceHandler();
The newer resilience APIs in .NET are useful, and the standard handler is a good starting point. The bigger issue is that teams often add resilience as a checkbox rather than thinking through the behaviour. What gets retried? How many times? Is there jitter? Is there a timeout per try and an overall timeout? What happens when the dependency is already failing? Does the caller have a retry budget, or can every request keep adding more work?
A more deliberate setup might cap the damage:
builder.Services
.AddHttpClient<IPaymentClient, PaymentClient>(client =>
{
client.Timeout = TimeSpan.FromSeconds(3);
})
.AddResilienceHandler("payments", pipeline =>
{
pipeline.AddTimeout(TimeSpan.FromSeconds(2));
pipeline.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 2,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
Delay = TimeSpan.FromMilliseconds(200)
});
pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
FailureRatio = 0.5,
MinimumThroughput = 20,
SamplingDuration = TimeSpan.FromSeconds(30),
BreakDuration = TimeSpan.FromSeconds(15)
});
});
The exact numbers are less important than the thinking.
Retries should be treated as extra traffic. Every retry has a cost. Every timeout keeps work alive for longer. Every failed dependency needs space to recover. A good retry policy reduces user visible failures during short blips. A bad one turns a slow dependency into a shared incident.
Background workers can attack your database
Background processing is another easy place to create accidental overload. The API stays responsive because it drops work onto a queue. That is good. The queue absorbs spikes. Also good. Then workers start pulling as fast as possible. The database becomes the real victim.
This usually starts with code that feels clean:
public sealed class ImportWorker(
Channel<ImportJob> channel,
IServiceScopeFactory scopeFactory) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stopToken)
{
await foreach (var job in channel.Reader.ReadAllAsync(stopToken))
{
_ = ProcessJob(job, stopToken);
}
}
private async Task ProcessJob(ImportJob job, CancellationToken stopToken)
{
using var scope = scopeFactory.CreateScope();
var handler = scope.ServiceProvider.GetRequiredService<IImportHandler>();
await handler.Handle(job, stopToken);
}
}
The worker reads jobs and starts processing them. The problem is that nothing controls concurrency. If the channel fills up, the worker can create a large amount of simultaneous work. Each job might open database connections, make HTTP calls, allocate memory, write logs, and publish events. The queue protected the API, but the worker moved the overload somewhere else.
A safer worker makes concurrency explicit:
public sealed class ImportWorker(
Channel<ImportJob> channel,
IServiceScopeFactory scopeFactory,
ILogger<ImportWorker> logger) : BackgroundService
{
private const int MaxConcurrency = 8;
protected override async Task ExecuteAsync(CancellationToken stopToken)
{
using var semaphore = new SemaphoreSlim(MaxConcurrency);
var running = new List<Task>();
await foreach (var job in channel.Reader.ReadAllAsync(stopToken))
{
await semaphore.WaitAsync(stopToken);
var task = ProcessJobSafely(job, semaphore, stopToken);
running.Add(task);
running.RemoveAll(t => t.IsCompleted);
}
await Task.WhenAll(running);
}
private async Task ProcessJobSafely(
ImportJob job,
SemaphoreSlim semaphore,
CancellationToken stopToken)
{
try
{
using var scope = scopeFactory.CreateScope();
var handler = scope.ServiceProvider.GetRequiredService<IImportHandler>();
await handler.Handle(job, stopToken);
}
catch (Exception ex)
{
logger.LogError(ex, "Failed to process import job {JobId}", job.Id);
}
finally
{
semaphore.Release();
}
}
}
This still processes work in parallel, but it gives the system a pressure valve. For serious workloads, you probably want more than a semaphore. You may need bounded channels, batch sizes, queue depth metrics, etc. The key point is simple. A background worker should have a speed limit. If it can pull faster than the rest of the system can safely process, it can become the thing that takes production down.
Health checks can become load tests
Health checks are supposed to make systems safer. They tell Kubernetes, Azure App Service, load balancers, and deployment platforms whether an instance is alive and ready for traffic. The dangerous version is the health check that does too much. It checks SQL. Then Redis. Then blob storage. Then a message broker. Then three internal APIs. Then Key Vault. Then maybe it runs a small query to prove the database is really working. That can feel thorough. Now multiply it. If you have 30 pods and something calls the readiness endpoint every few seconds, your health check is no longer just a health check. It is recurring production traffic. If the health check hits dependencies during an incident, it adds load at exactly the wrong time. Theres a better split. A liveness check should usually prove the process is alive. A readiness check should prove the app is ready to receive traffic. Deep dependency checks are useful, but they should be handled carefully, cached briefly, or moved into diagnostics that humans and monitoring systems can query deliberately.
This is the kind of health check that can get expensive:
builder.Services
.AddHealthChecks()
.AddSqlServer(connectionString)
.AddRedis(redisConnection)
.AddUrlGroup(new Uri("https://pricing.internal/health"))
.AddUrlGroup(new Uri("https://users.internal/health"))
.AddUrlGroup(new Uri("https://payments.internal/health"));
There are cases where dependency checks are useful. The mistake is pretending they are free. For high-scale systems, health checks should be simple, cheap, and predictable. They should not become a hidden load generator.
Cache stampedes are internal traffic spikes
Caching can save a system, but it can also create sharp traffic spikes. A popular cache key expires. Every request misses at the same time. Every app instance tries to rebuild the same value. The database or downstream API receives a sudden burst of identical work. This is a cache stampede. It often appears as a strange production pattern. Everything is fine, then latency jumps every few minutes. Database CPU spikes. Logs show repeated calls for the same data. Then the system settles again. The cache was added to reduce load. The expiry pattern created bursts.
The risky version looks like this:
public async Task<ProductSummary> GetSummary(Guid productId, CancellationToken stopToken)
{
var cacheKey = $"product-summary:{productId}";
var cached = await cache.GetStringAsync(cacheKey, stopToken);
if (cached is not null)
{
return JsonSerializer.Deserialize<ProductSummary>(cached)!;
}
var summary = await database.LoadProductSummary(productId, stopToken);
await cache.SetStringAsync(
cacheKey,
JsonSerializer.Serialize(summary),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
},
stopToken);
return summary;
}
This works until many requests miss together. A better design prevents every caller from rebuilding the same value at the same time. Depending on the system, that might mean per-key locking, stale-while-revalidate, randomised expiry, early refresh, or single-flight loading.
Even a simple randomised expiry helps avoid every key expiring on the same boundary:
var expiry = TimeSpan.FromMinutes(5)
.Add(TimeSpan.FromSeconds(Random.Shared.Next(0, 60)));
await cache.SetStringAsync(
cacheKey,
JsonSerializer.Serialize(summary),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = expiry
},
stopToken);
That does not solve every cache problem, but it removes one common source of synchronised load. Caching should flatten demand. If it creates sharp bursts, it can become part of the overload story.
Startup code can create deploy-time incidents
One of the easiest ways to overload your own system is during deployment. A new version rolls out. Multiple instances start. Each instance loads configuration, warms caches, fetches secrets, preloads reference data, validates external services, runs startup checks, opens connections, and maybe applies migrations.
That might be fine with one instance. With 40 instances, it can become a deploy-time traffic spike. This is especially painful when deployments happen during an already busy period. The app is under normal production load, then every new replica starts doing the same expensive startup work at the same time.
Startup work feels safe because it happens before traffic. In reality, it still consumes shared dependencies. Database migrations are the classic example. Running migrations automatically on startup can look convenient, especially early in a project. Then the system grows, the deployment model changes, and every instance suddenly has code capable of touching schema state on boot.
Cache warming has the same problem. It sounds responsible to warm everything before accepting traffic. It can also mean every pod hits the database at once to load data that only one pod really needed to prepare. A safer approach is to keep startup light. Do the minimum needed for the process to start. Move expensive one-time work into deployment jobs. Make cache warming gradual or lazy. Use readiness checks to control when instances receive traffic, but do not turn readiness into a full dependency test suite. The goal is not to make startup empty. The goal is to stop every replica from behaving like it owns the whole platform.
Logs and metrics can add to the blast radius
Observability helps you understand production, but it also has a runtime cost. During an incident, systems often log more. More errors mean more exception logs. More retries mean more warning logs. More failed dependency calls mean more telemetry. More telemetry means more CPU, memory, network traffic, and ingestion pressure. Its very easy to build a system where failure generates more work than success.
This can be especially bad with high cardinality logging. User IDs, request IDs, order IDs, payload fragments, dynamic labels, and exception details can all be useful. They can also make log storage and metric backends expensive and noisy. The app may survive the original issue, then start struggling because it is trying to describe the issue in too much detail. You still need logs. You still need metrics. You still need traces. But production telemetry needs limits. Sampling, log levels, metric cardinality, payload size, and sink behaviour all deserve attention. Console logging in containers can also become surprisingly expensive when volume gets high. Observability should help you recover. It should not become another source of pressure.
The pattern is almost always missing limits
The common theme across all of this is not bad engineering. Retries are useful. Parallelism is useful. Queues are useful. Health checks are useful. Caching is useful. Observability is useful. The danger comes from useful patterns without limits. A retry policy needs a budget. Fan-out needs bounded concurrency. Workers need backpressure. Health checks need to be cheap. Most accidental DDoS patterns come from code that assumes the rest of the system can always keep up. Production teaches you where that assumption fails.
What I would watch in a real .NET system
For a .NET API, I would start by watching the shape of work rather than only the raw request count. How many outbound HTTP calls does one inbound request create? How many database queries? How many retries? Then I would look for traffic multipliers. A single request turning into ten downstream calls. A single failure turning into three retries. A single queue message turning into twenty writes. A single deployment causing every instance to warm the same data. That is where the risk usually hides. The fix is rarely one magic library. It is usually a set of simple limits placed in the right parts of the system.
The awkward truth
You do not need a hostile actor to create a DDoS-shaped incident. A normal deploy can do it. A retry policy can do it. A background worker can do it. Thats what makes these incidents frustrating. The code usually looks sensible in isolation. The real question is what happens when every instance, every request, every retry, and every worker does the sensible thing at the same time. Thats where many .NET systems get caught.




