<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[FullStack City]]></title><description><![CDATA[Microsoft development blog covering frontend, backend, databases and Azure]]></description><link>https://fullstackcity.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1742122025267/1d5fb5ac-0b1f-4bc4-adc0-48b9b70ea37a.png</url><title>FullStack City</title><link>https://fullstackcity.com</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 10 Apr 2026 09:11:42 GMT</lastBuildDate><atom:link href="https://fullstackcity.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Denormalisation for Performance in C#]]></title><description><![CDATA[How to unlock real performance gains without turning your data model into a mess
Most engineers start in the same place. You normalise the schema, remove duplication, keep each fact in one place, and ]]></description><link>https://fullstackcity.com/denormalisation-for-performance-in-c</link><guid isPermaLink="true">https://fullstackcity.com/denormalisation-for-performance-in-c</guid><category><![CDATA[denormalization]]></category><category><![CDATA[software development]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[performance]]></category><category><![CDATA[C#]]></category><category><![CDATA[dotnet]]></category><category><![CDATA[Microsoft]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 02 Apr 2026 20:49:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/49a783a7-1216-433f-b8d9-35aa00e82c9f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>How to unlock real performance gains without turning your data model into a mess</p>
<p>Most engineers start in the same place. You normalise the schema, remove duplication, keep each fact in one place, and rely on joins to reconstruct the answer when the application needs it. That is still the right default for a transactional system. The problem is that many people quietly stretch that rule too far. They end up assuming that a data model which is logically clean must also be operationally fast. In production, that falls apart quickly.</p>
<p>The real cost of a heavily normalised model is not usually visible in one query. It shows up in repetition. The same joins run over and over. The same aggregates are recalculated on every request. The same object graph is rebuilt for every page, every API call, every dashboard tile, and every export. The database becomes a reconstruction engine. The application becomes a shaping engine. Both work hard, not because the business needs new information, but because the model forces them to keep rebuilding information the system already knows.</p>
<h2>This is where denormalisation comes in.</h2>
<p>Denormalisation is a deliberate decision to move work away from the read path and into the write path, or into a background projection step, because doing that once is cheaper than doing the same work thousands of times on demand. In a modern C# system, especially one built with <a href="http://ASP.NET">ASP.NET</a> Core, EF Core, background workers, queues, Redis, and event driven processing, that trade can transform performance.</p>
<p>The gains are not abstract. You see them in lower latency, lower database CPU, fewer allocations, more stable p95 and p99 response times, better concurrency, and less fragile query behaviour. You also get a cleaner separation between the source of truth and the shape that the application actually needs at the edge.</p>
<p>This is important mainly in systems with heavy read traffic, complex dashboards, queue screens, search pages, configuration endpoints, reporting APIs, and integration surfaces that repeatedly ask for the same shaped view of the data. If you treat every one of those reads as a fresh act of discovery, the system wastes time. If you precompute and persist the shape once, the request becomes a cheap lookup.</p>
<p>The key idea is simple. Normalisation optimises storage and correctness. Denormalisation optimises access. Mature systems usually need both.</p>
<h2>The hidden cost of clean relational models</h2>
<p>A normalised schema protects integrity. That is its job. It makes writes understandable and it keeps the domain tidy. The trouble starts when the application’s hot paths are read heavy and shape heavy.</p>
<p>Imagine a common business screen. You need to show a case list with case number, customer name, policy type, current stage, outstanding balance, number of open actions, last correspondence date, assigned handler, SLA status, and a search summary. In a fully normalised design, those values may come from six or seven tables, plus a few aggregate queries, plus a handful of rules in application code. Nothing about that is inherently wrong. The problem is frequency. If that same shape is requested constantly, the system is paying the cost of reconstruction on every request.</p>
<p>The request path starts to look like this.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/97c37459-20b3-4baa-8375-849e487bdda9.png" alt="" style="display:block;margin:0 auto" />

<p>That path is often acceptable in development. Small data volumes hide the cost. A local database hides the latency. The ORM hides the SQL. Then production arrives, concurrency rises, the dataset grows, and the endpoint that felt harmless becomes one of the hottest parts of the estate.</p>
<p>The first thing people usually try is index tuning. That helps. Then they add projections in LINQ. That helps a bit more. Then they introduce caching. That can help a lot, but it often hides the problem rather than fixing it. A cache miss still falls back to the same expensive reconstruction path. Once the underlying shape is wrong for the read pattern, you are tuning around the problem rather than changing it.</p>
<p>That is why denormalisation is so powerful. It does not ask how to run the same expensive query a little faster. It asks whether the query should exist in that form at all.</p>
<h2>What denormalisation really means in a C# system</h2>
<p>In practice, denormalisation in a C# system usually takes one of a few forms.</p>
<p>You store precomputed aggregates directly on a parent row, such as current balance, open item count, or last activity date.</p>
<p>You build a dedicated read model that already matches a page or API response.</p>
<p>You snapshot descriptive values, such as broker name or product name, onto a transactional record so you do not join to reference tables on every read.</p>
<p>You persist flags and classifications, such as IsUrgent, RiskBand, or HasOpenTasks, instead of recalculating them repeatedly.</p>
<p>You compile expensive response payloads into a cached or persisted format, often as JSON, so the request path can serve them directly.</p>
<p>Those are all forms of the same idea. You take work that would otherwise happen every time the application reads data and you pay for it once when the data changes.</p>
<p>That changes the economics of the system.</p>
<p>If a value changes once an hour but is read ten thousand times in that hour, it is usually madness to compute it ten thousand times. Store it. Keep it fresh. Read it cheaply.</p>
<h2>Where the performance gains come from</h2>
<p>The gains from denormalisation are easy to hand wave, but the useful part is knowing where they show up.</p>
<p>The first gain is query simplification. A query that previously needed multiple joins, aggregates, and conditional expressions can become a simple index seek against a flat row. That cuts database CPU, logical reads, memory pressure, and plan complexity.</p>
<p>The second gain is lower application overhead. Even if the database work is acceptable, materialising nested EF Core graphs and then reshaping them into DTOs still costs CPU and memory in the application. A flat read model avoids much of that.</p>
<p>The third gain is better tail performance. Complex queries are far more likely to show unstable p95 and p99 latencies, especially under concurrency or when parameter values vary. Simple denormalised queries are usually more predictable.</p>
<p>The fourth gain is improved cache behaviour. A denormalised row or payload already matches the response shape, so a cache miss is not painful. You do not have to rebuild the world before you can refill the cache.</p>
<p>The fifth gain is fewer cross service dependencies. If you snapshot small pieces of descriptive data, one service no longer needs to ask another service the same question on every request. That is often a bigger win than any SQL optimisation.</p>
<p>The sixth gain is better scaling behaviour. Once each request does less work, every instance can serve more traffic with more stable latency. Horizontal scaling starts to work properly because the units of work are cheaper and more predictable.</p>
<p>The right way to picture this is as two separate paths.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/f70d48a5-96e3-4499-a273-4d18b21d9f58.png" alt="" style="display:block;margin:0 auto" />

<p>The write path gets slightly heavier. The read path becomes dramatically cheaper. In read heavy systems that is exactly the trade you want.</p>
<h3>Technique one, precomputed aggregates</h3>
<p>This is the most obvious denormalisation technique and still one of the most effective. If an aggregate is read often and changes comparatively rarely, store it.</p>
<p>Think about balances, counts, totals, last updated timestamps, most recent activity, most recent payment date, open task count, total claim value, or number of outstanding documents. Engineers often recalculate these on every request because the database can do it. That does not mean it should.</p>
<p>A typical normalised query might look like this.</p>
<pre><code class="language-csharp">public sealed class AccountSummaryService
{
    private readonly FinanceDbContext _db;

    public AccountSummaryService(FinanceDbContext db)
    {
        _db = db;
    }

    public async Task&lt;AccountSummaryDto?&gt; GetAsync(Guid accountId, CancellationToken stopToken)
    {
        return await _db.Accounts
            .Where(x =&gt; x.Id == accountId)
            .Select(x =&gt; new AccountSummaryDto
            {
                AccountId = x.Id,
                CustomerName = x.Customer.Name,
                CurrentBalance = x.Transactions.Sum(t =&gt; t.Amount),
                OpenInvoiceCount = x.Invoices.Count(i =&gt; !i.IsPaid),
                LastPaymentUtc = x.Payments
                    .OrderByDescending(p =&gt; p.PaidAtUtc)
                    .Select(p =&gt; (DateTime?)p.PaidAtUtc)
                    .FirstOrDefault()
            })
            .SingleOrDefaultAsync(stopToken);
    }
}
</code></pre>
<p>This is tidy. It is also doing real work every time the endpoint is called. The sum is recomputed. The count is recomputed. The payment ordering is revisited. The joins are rebuilt. If that account page is busy, you are paying that cost repeatedly for no gain in truth.</p>
<p>A denormalised design moves those values onto the account row or onto a dedicated account summary table.</p>
<pre><code class="language-csharp">public sealed class Account
{
    public Guid Id { get; set; }
    public Guid CustomerId { get; set; }
    public decimal CurrentBalance { get; set; }
    public int OpenInvoiceCount { get; set; }
    public DateTime? LastPaymentUtc { get; set; }
}
</code></pre>
<p>The read path becomes much simpler.</p>
<pre><code class="language-csharp">public sealed class AccountSummaryService
{
    private readonly FinanceDbContext _db;

    public AccountSummaryService(FinanceDbContext db)
    {
        _db = db;
    }

    public async Task&lt;AccountSummaryDto?&gt; GetAsync(Guid accountId, CancellationToken stopToken)
    {
        return await _db.Accounts
            .AsNoTracking()
            .Where(x =&gt; x.Id == accountId)
            .Select(x =&gt; new AccountSummaryDto
            {
                AccountId = x.Id,
                CustomerName = x.Customer.Name,
                CurrentBalance = x.CurrentBalance,
                OpenInvoiceCount = x.OpenInvoiceCount,
                LastPaymentUtc = x.LastPaymentUtc
            })
            .SingleOrDefaultAsync(stopToken);
    }
}
</code></pre>
<p>The gain here is not subtle. You have shifted work from every read to only the writes that actually change the values.</p>
<p>There are two good ways to maintain these fields. If the value is part of a hard business invariant, update it in the same transaction as the canonical write. If the value is mainly for display or read optimisation, project it asynchronously through an outbox driven worker.</p>
<p>Here is a synchronous example.</p>
<pre><code class="language-csharp">public sealed class PaymentService
{
    private readonly FinanceDbContext _db;

    public PaymentService(FinanceDbContext db)
    {
        _db = db;
    }

    public async Task RecordPaymentAsync(Guid accountId, decimal amount, DateTime paidAtUtc, CancellationToken stopToken)
    {
        var account = await _db.Accounts.SingleAsync(x =&gt; x.Id == accountId, stopToken);

        _db.Payments.Add(new Payment
        {
            Id = Guid.NewGuid(),
            AccountId = accountId,
            Amount = amount,
            PaidAtUtc = paidAtUtc
        });

        account.CurrentBalance -= amount;
        account.LastPaymentUtc = paidAtUtc;

        await _db.SaveChangesAsync(stopToken);
    }
}
</code></pre>
<p>That looks almost boring, which is exactly the point. Good denormalisation is often simple. It gives you a cheap read path because the system has already done the work.</p>
<h3>Technique two, dedicated read models</h3>
<p>This is where denormalisation starts to move from a tactical optimisation into a strategic design choice. A read model is a table or document that exists purely because a specific screen, endpoint, or integration needs data in a specific shape.</p>
<p>Suppose you have an underwriting queue page. The page needs submission number, insured name, broker name, product class, status, risk rating, attachment count, assigned underwriter, created date, and an urgency flag. In a fully normalised schema those values may be scattered across several tables and some of them may need to be computed. You can absolutely query them on demand. You will just keep paying for it.</p>
<p>A denormalised read model lets you store exactly what the queue needs.</p>
<pre><code class="language-csharp">public sealed class SubmissionReviewQueueItem
{
    public Guid SubmissionId { get; set; }
    public string SubmissionNumber { get; set; } = string.Empty;
    public string InsuredName { get; set; } = string.Empty;
    public string BrokerName { get; set; } = string.Empty;
    public string ProductClass { get; set; } = string.Empty;
    public string Status { get; set; } = string.Empty;
    public string RiskRating { get; set; } = string.Empty;
    public int AttachmentCount { get; set; }
    public string AssignedUnderwriterName { get; set; } = string.Empty;
    public bool IsUrgent { get; set; }
    public DateTime CreatedUtc { get; set; }
}
</code></pre>
<p>The query then becomes a simple paged lookup.</p>
<pre><code class="language-csharp">public sealed class ReviewQueueService
{
    private readonly UnderwritingDbContext _db;

    public ReviewQueueService(UnderwritingDbContext db)
    {
        _db = db;
    }

    public async Task&lt;IReadOnlyList&lt;ReviewQueueItemDto&gt;&gt; GetPageAsync(int page, int pageSize, CancellationToken stopToken)
    {
        return await _db.SubmissionReviewQueueItems
            .AsNoTracking()
            .OrderByDescending(x =&gt; x.IsUrgent)
            .ThenBy(x =&gt; x.CreatedUtc)
            .Skip((page - 1) * pageSize)
            .Take(pageSize)
            .Select(x =&gt; new ReviewQueueItemDto
            {
                SubmissionId = x.SubmissionId,
                SubmissionNumber = x.SubmissionNumber,
                InsuredName = x.InsuredName,
                BrokerName = x.BrokerName,
                ProductClass = x.ProductClass,
                Status = x.Status,
                RiskRating = x.RiskRating,
                AttachmentCount = x.AttachmentCount,
                AssignedUnderwriterName = x.AssignedUnderwriterName,
                IsUrgent = x.IsUrgent,
                CreatedUtc = x.CreatedUtc
            })
            .ToListAsync(stopToken);
    }
}
</code></pre>
<p>This is the kind of change that can take an endpoint from unpredictable and expensive to stable and cheap. It also changes how you index. Instead of trying to satisfy a messy query against the whole domain model, you can build indexes specifically for the queue.</p>
<p>The strongest way to maintain a read model is with a projection pipeline. The domain write commits. An outbox message is recorded in the same transaction. A background worker consumes the outbox and updates the read model. The read path never has to rebuild the queue shape on demand.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/d1ae1e75-0588-4091-ad8e-eb48a861dbdb.png" alt="" style="display:block;margin:0 auto" />

<p>That pattern gives you reliability, observability, and a clean failure model. If a projection fails, you can retry it. If you need to rebuild, you can replay events or recalculate from the source of truth.</p>
<h3>Technique three, snapshot descriptive data</h3>
<p>A huge amount of hidden query cost comes from descriptive joins. Broker name. Product name. Region name. Handler display name. Organisation name. Status description. These are often joined into hot queries simply because they are stored elsewhere. That keeps the schema pure. It also keeps the read path unnecessarily busy.</p>
<p>Snapshotting descriptive values means copying them at the point where they matter. That often improves performance, and in many domains it also improves auditability because it preserves the value as it was when the transaction occurred.</p>
<p>Here is a simple example.</p>
<pre><code class="language-csharp">public sealed class Submission
{
    public Guid Id { get; set; }
    public Guid BrokerId { get; set; }
    public string BrokerNameSnapshot { get; set; } = string.Empty;
    public string ProductNameSnapshot { get; set; } = string.Empty;
    public string InsuredName { get; set; } = string.Empty;
    public DateTime CreatedUtc { get; set; }
}
</code></pre>
<p>When you create the submission, you take the snapshot.</p>
<pre><code class="language-csharp">public sealed class SubmissionService
{
    private readonly UnderwritingDbContext _db;

    public SubmissionService(UnderwritingDbContext db)
    {
        _db = db;
    }

    public async Task&lt;Guid&gt; CreateAsync(CreateSubmissionCommand command, CancellationToken stopToken)
    {
        var broker = await _db.Brokers.SingleAsync(x =&gt; x.Id == command.BrokerId, stopToken);
        var product = await _db.Products.SingleAsync(x =&gt; x.Id == command.ProductId, stopToken);

        var submission = new Submission
        {
            Id = Guid.NewGuid(),
            BrokerId = broker.Id,
            BrokerNameSnapshot = broker.DisplayName,
            ProductNameSnapshot = product.Name,
            InsuredName = command.InsuredName,
            CreatedUtc = DateTime.UtcNow
        };

        _db.Submissions.Add(submission);
        await _db.SaveChangesAsync(stopToken);

        return submission.Id;
    }
}
</code></pre>
<p>Now every queue, dashboard, export, and search result that needs the broker or product name can read it directly from the submission or from a projection row built from it. That removes joins from the hot path and makes results historically accurate. If the broker later changes their display name, older submissions do not silently rewrite history.</p>
<p>This is one of the most underused denormalisation techniques because people dismiss it as duplication. In reality it is often one of the cleanest improvements you can make.</p>
<h3>Technique four, persist flags and classifications</h3>
<p>A lot of expensive read logic is not about fetching data at all. It is about classifying it. Is this item urgent. Is this account over limit. Is this submission nearing SLA breach. Does this customer require manual review. Is this case ready for escalation. Those rules often combine dates, counts, statuses, and related rows. If they sit on the hot read path, they get recalculated constantly.</p>
<p>If the answer is needed often, persist it.</p>
<pre><code class="language-csharp">public sealed class SubmissionReviewQueueItem
{
    public Guid SubmissionId { get; set; }
    public bool IsUrgent { get; set; }
    public string RiskBand { get; set; } = string.Empty;
    public DateTime? EscalationDueUtc { get; set; }
}
</code></pre>
<p>The projector decides the values once.</p>
<pre><code class="language-csharp">public static class SubmissionClassification
{
    public static bool CalculateUrgency(DateTime createdUtc, string status, int attachmentCount)
    {
        if (status == "Completed")
        {
            return false;
        }

        if (attachmentCount == 0)
        {
            return true;
        }

        return DateTime.UtcNow - createdUtc &gt; TimeSpan.FromHours(24);
    }

    public static string CalculateRiskBand(decimal score)
    {
        if (score &gt;= 80m)
        {
            return "High";
        }

        if (score &gt;= 50m)
        {
            return "Medium";
        }

        return "Low";
    }
}
</code></pre>
<p>Once you store these values, the database can index them directly. That is the real shift. Instead of asking the engine to compute urgency on every candidate row, you let it seek on IsUrgent or RiskBand. That changes both performance and plan quality.</p>
<h3>Technique five, flattened search columns</h3>
<p>Search is a classic source of accidental complexity. Users want one box. They expect it to match case number, broker name, customer name, postcode, product, maybe even a phone number or note. A normalised model turns that into a wide OR condition with several joins, or pushes the team into adding a separate search engine earlier than they really need one.</p>
<p>A useful middle ground is to denormalise search into a dedicated row with flattened searchable fields.</p>
<pre><code class="language-csharp">public sealed class SubmissionSearchRow
{
    public Guid SubmissionId { get; set; }
    public string SubmissionNumber { get; set; } = string.Empty;
    public string BrokerName { get; set; } = string.Empty;
    public string InsuredName { get; set; } = string.Empty;
    public string Postcode { get; set; } = string.Empty;
    public string ProductName { get; set; } = string.Empty;
    public string SearchText { get; set; } = string.Empty;
    public DateTime CreatedUtc { get; set; }
}
</code></pre>
<p>A projector builds the flattened text.</p>
<pre><code class="language-csharp">public sealed class SubmissionSearchProjector
{
    private readonly UnderwritingDbContext _db;

    public SubmissionSearchProjector(UnderwritingDbContext db)
    {
        _db = db;
    }

    public async Task RebuildAsync(Guid submissionId, CancellationToken stopToken)
    {
        var source = await _db.Submissions
            .Where(x =&gt; x.Id == submissionId)
            .Select(x =&gt; new
            {
                x.Id,
                x.SubmissionNumber,
                x.BrokerNameSnapshot,
                x.InsuredName,
                x.Postcode,
                x.ProductNameSnapshot,
                x.CreatedUtc
            })
            .SingleAsync(stopToken);

        var searchText = string.Join(' ',
            source.SubmissionNumber,
            source.BrokerNameSnapshot,
            source.InsuredName,
            source.Postcode,
            source.ProductNameSnapshot)
            .ToLowerInvariant();

        var row = await _db.SubmissionSearchRows.FindAsync(new object[] { submissionId }, stopToken);

        if (row is null)
        {
            row = new SubmissionSearchRow { SubmissionId = submissionId };
            _db.SubmissionSearchRows.Add(row);
        }

        row.SubmissionNumber = source.SubmissionNumber;
        row.BrokerName = source.BrokerNameSnapshot;
        row.InsuredName = source.InsuredName;
        row.Postcode = source.Postcode;
        row.ProductName = source.ProductNameSnapshot;
        row.CreatedUtc = source.CreatedUtc;
        row.SearchText = searchText;

        await _db.SaveChangesAsync(stopToken);
    }
}
</code></pre>
<p>This is not a replacement for a true search platform in every case. It is a practical step that often solves internal search needs very well and removes painful joins from the request path.</p>
<h3>Technique six, compiled JSON payloads</h3>
<p>Some read paths are expensive not because the data is hard to query, but because the response is expensive to build. Configuration payloads, product catalogues, pricing rules, feature flag definitions, and reference datasets often fall into this category. The source data may be split across multiple tables and the application may have to turn it into a nested object model before serialising it.</p>
<p>If the payload changes relatively rarely and is read heavily, compile it once and store the result.</p>
<pre><code class="language-csharp">public sealed class CompiledProductConfig
{
    public Guid ProductId { get; set; }
    public string Version { get; set; } = string.Empty;
    public string JsonPayload { get; set; } = string.Empty;
    public DateTime CompiledUtc { get; set; }
}
</code></pre>
<p>A compiler service generates the payload after changes.</p>
<pre><code class="language-csharp">public sealed class ProductConfigCompiler
{
    private readonly ProductDbContext _db;

    public ProductConfigCompiler(ProductDbContext db)
    {
        _db = db;
    }

    public async Task CompileAsync(Guid productId, CancellationToken stopToken)
    {
        var product = await _db.Products
            .Where(x =&gt; x.Id == productId)
            .Select(x =&gt; new
            {
                x.Id,
                x.Name,
                Rules = x.Rules
                    .OrderBy(r =&gt; r.Priority)
                    .Select(r =&gt; new
                    {
                        r.Key,
                        r.Operator,
                        r.Value
                    })
                    .ToList()
            })
            .SingleAsync(stopToken);

        var payload = JsonSerializer.Serialize(product);

        var row = await _db.CompiledProductConfigs.FindAsync(new object[] { productId }, stopToken);

        if (row is null)
        {
            row = new CompiledProductConfig { ProductId = productId };
            _db.CompiledProductConfigs.Add(row);
        }

        row.Version = Guid.NewGuid().ToString("N");
        row.JsonPayload = payload;
        row.CompiledUtc = DateTime.UtcNow;

        await _db.SaveChangesAsync(stopToken);
    }
}
</code></pre>
<p>The request path then becomes almost trivial.</p>
<pre><code class="language-csharp">public sealed class ProductConfigService
{
    private readonly ProductDbContext _db;

    public ProductConfigService(ProductDbContext db)
    {
        _db = db;
    }

    public async Task&lt;string?&gt; GetCompiledJsonAsync(Guid productId, CancellationToken stopToken)
    {
        return await _db.CompiledProductConfigs
            .AsNoTracking()
            .Where(x =&gt; x.ProductId == productId)
            .Select(x =&gt; x.JsonPayload)
            .SingleOrDefaultAsync(stopToken);
    }
}
</code></pre>
<p>This is denormalisation at the payload level. It is blunt, and when used in the right place it is extremely effective. You remove query assembly and serialisation work from the hot path entirely.</p>
<h2>How to implement denormalisation cleanly in .NET</h2>
<p>The biggest risk with denormalisation is not duplication. It is ambiguity. If nobody can tell which model is canonical, which values are derived, how freshness works, and how to repair drift, the design will decay.</p>
<p>A clean .NET implementation usually has four parts.</p>
<p>You keep the canonical write model explicit. This is the model the business truly owns.</p>
<p>You emit an outbox event or domain event when the source data changes.</p>
<p>You project that event into one or more read models in a background process.</p>
<p>You make the read endpoints talk to the read models directly, without trying to reconstruct the domain again.</p>
<p>A simple hosted service can handle the projection side. In larger systems that may be a separate worker or Azure Function. The important part is not the hosting model. The important part is that the projection is idempotent and observable.</p>
<pre><code class="language-csharp">public sealed class SubmissionProjectionWorker : BackgroundService
{
    private readonly IServiceScopeFactory _scopeFactory;
    private readonly ILogger&lt;SubmissionProjectionWorker&gt; _logger;

    public SubmissionProjectionWorker(
        IServiceScopeFactory scopeFactory,
        ILogger&lt;SubmissionProjectionWorker&gt; logger)
    {
        _scopeFactory = scopeFactory;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stopToken)
    {
        while (!stopToken.IsCancellationRequested)
        {
            using var scope = _scopeFactory.CreateScope();
            var db = scope.ServiceProvider.GetRequiredService&lt;UnderwritingDbContext&gt;();
            var projector = scope.ServiceProvider.GetRequiredService&lt;SubmissionProjector&gt;();

            var batch = await db.OutboxMessages
                .Where(x =&gt; x.ProcessedUtc == null &amp;&amp; x.Type == "SubmissionChanged")
                .OrderBy(x =&gt; x.OccurredUtc)
                .Take(100)
                .ToListAsync(stopToken);

            if (batch.Count == 0)
            {
                await Task.Delay(TimeSpan.FromSeconds(1), stopToken);
                continue;
            }

            foreach (var message in batch)
            {
                try
                {
                    await projector.ProjectAsync(message.AggregateId, stopToken);
                    message.ProcessedUtc = DateTime.UtcNow;
                }
                catch (Exception ex)
                {
                    _logger.LogError(ex, "Failed to project submission {SubmissionId}", message.AggregateId);
                }
            }

            await db.SaveChangesAsync(stopToken);
        }
    }
}
</code></pre>
<p>That worker does not need to be clever. It needs to be reliable. Projection code should be deterministic, repeatable, and easy to rebuild.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/4f56f637-5158-43d7-8a9c-9bd79cb37c35.png" alt="" style="display:block;margin:0 auto" />

<p>This pattern works because it gives you separation. Writes preserve truth. Projections shape data for speed. Reads stay lean.</p>
<p>Measuring the gains properly</p>
<p>If you denormalise without measuring, you are guessing. Sometimes the guess is right, but expert engineering means proving it.</p>
<p>Do not just measure average response time. That hides a lot. You want median, p95, and p99. You want database CPU and logical reads. You want application allocations. You want throughput under concurrency. You want to know whether you made writes slightly heavier and whether that matters.</p>
<p>The pattern to look for is simple. If the old design has acceptable median latency but poor p95 and p99, denormalisation usually helps a lot because it simplifies the work and stabilises the plan. If the old design burns database CPU and application allocations on every request, denormalisation usually helps there too. In C# terms, you should benchmark at three levels. Measure the database query cost. Measure the endpoint under realistic concurrency. Measure the shaping cost in process if serialisation or object mapping is part of the problem. A single stopwatch around an API call is not enough. Its common to see a read endpoint go from well over one hundred milliseconds to below twenty once it moves from reconstruction to direct lookup. More importantly, the tail often tightens dramatically. The endpoint stops having bad days.</p>
<p>That is a stronger win than a modest median improvement because production pain lives in the tail.</p>
<p>The trade offs you must own</p>
<p>Denormalisation works because it changes where the work happens. That means the cost does not disappear. It moves.</p>
<p>Writes may become heavier because you now update projections or summary fields.</p>
<p>You may accept eventual consistency if projections run asynchronously.</p>
<p>You add more moving parts, especially if you use outbox processing and background workers.</p>
<p>You need rebuild and reconciliation tooling because projections can drift if there is a bug.</p>
<p>None of those are reasons to avoid denormalisation. They are reasons to design it properly.</p>
<p>The important discipline is to be explicit. Name read models as read models. Keep projection logic out of the domain core where possible. Decide which values must be transactionally current and which values can lag slightly. Build replays or rebuild jobs so you can recover from bad logic. Make it obvious to every engineer which table tells the truth and which table exists for speed.</p>
<p>If you fail to do that, denormalisation becomes accidental duplication. That is where teams get burned.</p>
<h2>When not to denormalise</h2>
<p>Do not denormalise because a query feels ugly. Ugly code is not always expensive code.</p>
<p>Do not denormalise values you cannot clearly derive and refresh.</p>
<p>Do not duplicate fields with no owner and no repair story.</p>
<p>Do not turn projections into hidden sources of truth.</p>
<p>Do not assume eventual consistency is always harmless. A stale dashboard count is one thing. A stale available credit decision is another.</p>
<p>Do not denormalise everything. Most systems only need it in a few hot places. If you apply it everywhere, you increase complexity without improving the parts that matter.</p>
<p>The expert judgement is knowing where the hot paths really are and shaping only those.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/2f94efd7-d7b0-49e8-b7a5-ad64e0c48c07.png" alt="" style="display:block;margin:0 auto" />

<p>Denormalisation is one of the few performance techniques that changes the shape of the problem instead of merely tuning around it. Indexes, cache layers, and ORM tweaks all matter, but they mostly help you execute the same work more efficiently. Denormalisation asks a better question. Should the system be doing this work on every read at all.</p>
<p>In many serious C# systems, the honest answer is no.</p>
<p>If the application already knows a balance, a count, a risk band, a queue shape, a search row, or a compiled payload, and if that value is read far more often than it changes, storing it in the form the read path needs is not a compromise. It is good engineering.</p>
<p>The strongest systems keep their transactional core clean and truthful. Then they build denormalised shapes around that core for speed. They measure the gains. They own the trade offs. They keep projections rebuildable. They keep the source of truth clear. That is how you get fast systems without losing control of the design. If you want real performance gains from denormalisation, do not think of it as breaking the rules. Think of it as moving work to the cheapest place in the system. When you do that deliberately, your database stops reconstructing the obvious, your APIs stop carrying unnecessary weight, and your read path starts behaving like it was designed for production rather than for a whiteboard.</p>
]]></content:encoded></item><item><title><![CDATA[API Payload Compression in ASP.NET Core]]></title><description><![CDATA[People talk about payload compression as if it were a single checkbox in Program.cs. Turn on gzip, maybe add Brotli, and move on. That approach is no good for a production system that serves high-volu]]></description><link>https://fullstackcity.com/api-payload-compression-in-asp-net-core</link><guid isPermaLink="true">https://fullstackcity.com/api-payload-compression-in-asp-net-core</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[dotnet]]></category><category><![CDATA[C#]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[api]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Sat, 28 Mar 2026 16:21:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/74c42aa2-c8c7-4805-b3aa-ab56f77af5a7.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>People talk about payload compression as if it were a single checkbox in <code>Program.cs</code>. Turn on gzip, maybe add Brotli, and move on. That approach is no good for a production system that serves high-volume JSON, handles uploads, runs behind a proxy, and needs predictable latency under load.</p>
<p>In real systems, compression is a transport concern with application-level consequences. It affects bandwidth, CPU, latency, caching behaviour, security posture, and even how you shape your contracts. ASP.NET Core gives you built-in middleware for response compression and request decompression, but the framework does not make the architectural decisions for you. You still need to decide what to compress, where to compress it, when to reject it, and how to avoid using compression as a bandage for bad API design. The current ASP.NET Core guidance is still clear on the fundamentals, use compression to reduce payload size, prefer server or proxy compression where available, and use the built-in middleware when Kestrel or HTTP.sys are serving the app directly because they do not provide built-in compression themselves.</p>
<p>The first mistake Developers make is thinking compression is mainly about speed. It is really about trade-offs. Compression reduces bytes on the wire, which often improves responsiveness, especially for JSON and other text-heavy payloads. At the same time, it costs CPU to compress and decompress data. That trade-off is usually favourable for medium and large JSON responses over public networks, but not always favourable for tiny payloads or already-compressed binary content. That is why serious API design starts with payload shape first and compression second. If your endpoint returns bloated documents with duplicated fields, unnecessary nesting, and data the caller never asked for, compression will help, but only after you already lost the bigger battle. Microsoft’s guidance frames compression as a way to reduce response size and improve responsiveness, not as a replacement for lean responses.</p>
<p>A useful mental model is to separate outbound compression from inbound decompression. Outbound compression is the default case. Your API produces JSON, problem details, text, CSV, or other compressible formats, and the client advertises supported encodings through Accept-Encoding. The response compression middleware examines the request and response, selects a provider such as Brotli or gzip, and writes the compressed payload if the response type is eligible. Inbound decompression is different. There, the client sends a compressed request body and marks it with Content-Encoding, and the request decompression middleware unwraps it before model binding or request body reading happens. ASP.NET Core supports both directions, but they solve different problems and they should not be enabled with the same level of enthusiasm. Response compression is broadly useful. Request decompression is useful only when clients are actually sending large compressed payloads, typically large JSON, text, or similar upload bodies.</p>
<p>In practice, the best default for a modern ASP.NET Core API is straightforward. Compress responses that are actually compressible. Prefer Brotli when the client supports it. Fall back to gzip for compatibility. Leave already-compressed formats alone. If you are running behind IIS, Apache, or Nginx, prefer server-based compression because Microsoft explicitly notes that server modules generally outperform the ASP.NET Core middleware. If you are serving directly from Kestrel or HTTP.sys, use the middleware because those servers do not currently offer built-in compression support.</p>
<p>The second mistake Developers make is compressing everything indiscriminately. Compression is not magic. It works best on text-heavy formats because they contain repeating structure. JSON is the classic win because property names, quotes, punctuation, and repeated values compress well. XML, HTML, CSS, JavaScript, CSV, plain text, and problem details are all strong candidates. JPEG, PNG, MP4, ZIP, and many other binary formats are not. Recompressing data that is already compressed often gives you negligible size reduction and unnecessary CPU overhead. This is exactly why the ASP.NET Core response compression middleware is configured around MIME types. You tell it what content types are eligible instead of asking it to blindly compress whatever leaves the process.</p>
<p>Here is a production-friendly baseline for a .NET API using minimal APIs. It enables response compression, explicitly adds Brotli and gzip, includes JSON-related MIME types, and sets both providers to Fastest because API latency usually matters more than squeezing out the very last percentage point of compression ratio.</p>
<pre><code class="language-csharp">
using Microsoft.AspNetCore.RequestDecompression;
using Microsoft.AspNetCore.ResponseCompression;
using System.IO.Compression;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddResponseCompression(options =&gt;
{
    options.EnableForHttps = true;

    options.Providers.Add&lt;BrotliCompressionProvider&gt;();
    options.Providers.Add&lt;GzipCompressionProvider&gt;();

    options.MimeTypes = ResponseCompressionDefaults.MimeTypes.Concat(new[]
    {
        "application/json",
        "application/problem+json",
        "text/plain",
        "text/csv"
    });
});

builder.Services.Configure&lt;BrotliCompressionProviderOptions&gt;(options =&gt;
{
    options.Level = CompressionLevel.Fastest;
});

builder.Services.Configure&lt;GzipCompressionProviderOptions&gt;(options =&gt;
{
    options.Level = CompressionLevel.Fastest;
});

builder.Services.AddRequestDecompression();

var app = builder.Build();

app.UseRequestDecompression();
app.UseResponseCompression();

app.MapGet("/api/orders/{id:int}", (int id) =&gt;
{
    var response = new
    {
        Id = id,
        Customer = "ACME Insurance",
        Lines = Enumerable.Range(1, 250).Select(i =&gt; new
        {
            LineNumber = i,
            Sku = $"SKU-{i:0000}",
            Quantity = i % 5 + 1,
            Price = 49.99m + i
        })
    };

    return Results.Json(response);
});

app.Run();
</code></pre>
<p>This gives you the right starting point, but serious systems usually need more discipline than a baseline setup. One example is compression level. Many developers instinctively choose Optimal, assuming it must be better because the name sounds better. That is too simplistic. In APIs, especially low-latency APIs, Fastest is often the better operational choice because it cuts CPU cost and still captures most of the size reduction on JSON. Optimal can make sense for larger batch-style responses or download scenarios where throughput matters more than raw request latency. The right answer is not theoretical. Benchmark it with your own payloads and concurrency profile.</p>
<p>A useful way to think about the pipeline is this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/b5d88a82-1de0-4a8b-889f-35aee4effaaf.png" alt="" style="display:block;margin:0 auto" />

<p>Another place where Devlopers get sloppy is HTTPS compression. ASP.NET Core exposes EnableForHttps, and the documented default is false. Microsoft also warns that enabling compression for HTTPS responses containing remotely manipulable content may expose security problems. That warning exists because compression can become part of a side-channel when attacker-controlled input and secret-bearing content share the same compressed response. In normal internal or line-of-business APIs, many people still enable HTTPS compression because the benefits are real and the attack surface may be limited, but that decision should be deliberate. If you reflect attacker-supplied content into a response that also carries secrets, tokens, or sensitive dynamic values, do not just enable HTTPS compression and forget about it. Understand what is actually in those responses.</p>
<p>Request decompression deserves even more caution. The feature is real and useful, but it is not something to switch on simply because the middleware exists. The request decompression middleware automatically inspects Content-Encoding and decompresses supported request bodies, which saves you from writing custom request-body handling code. That part is good. The hard part is operational safety. Inbound compressed payloads shift CPU work onto your servers and can amplify resource consumption if abused. If you accept large compressed uploads, you should pair that with request size limits, timeout controls, careful endpoint scoping, and monitoring. The middleware also needs to run before anything reads the body, otherwise you are too late.</p>
<p>A targeted inbound example looks like this:</p>
<pre><code class="language-csharp">app.UseRequestDecompression();

app.MapPost("/api/import/products", async (HttpContext httpContext) =&gt;
{
    httpContext.Features.Get&lt;IHttpMaxRequestBodySizeFeature&gt;()?.DisableMaxRequestBodySize();

    using var reader = new StreamReader(httpContext.Request.Body);
    var json = await reader.ReadToEndAsync();

    return Results.Ok(new
    {
        Message = "Compressed request accepted",
        Characters = json.Length
    });
});

app.Run();
</code></pre>
<p>That sample shows the mechanics, but the operational point matters more than the syntax. You should not enable inbound decompression across every endpoint unless the endpoints really need it. A typical CRUD API that accepts small POST and PUT bodies gets little benefit from compressed requests. A bulk import endpoint that accepts a multi-megabyte JSON document might benefit a lot.</p>
<p>You should also think about where compression belongs in a broader deployment. If you are behind Nginx or IIS, server-side compression at the edge is often the better place for outbound response compression because it takes work off the app and can be tuned centrally. Microsoft’s guidance says exactly that, noting that the performance of the ASP.NET Core middleware probably will not match dedicated server modules. That does not make middleware wrong. It just means you should not ignore the reverse proxy when you have one. If you already terminate traffic behind a capable gateway, that is often the best place to handle compression consistently.</p>
<p>Caching behaviour is another area where compression changes system behaviour more than people expect. Once you serve multiple encoded versions of the same representation, the cache key must vary by encoding. That is why compressed responses are tied to Accept-Encoding, and why intermediaries need to treat the compressed and uncompressed versions as distinct representations. If you run API caching, CDN caching, or reverse-proxy caching, compression is no longer just a transport tweak. It becomes part of representation management. That matters even more if you also use ETags. In a well-behaved system, you need consistency in how representations are generated and validated, especially if compression is handled at the proxy layer instead of the app layer.</p>
<p>Another point is that compression interacts with streaming. If your endpoint sends buffered JSON in one shot, compression is easy. If you are sending data progressively, such as large streamed responses, NDJSON, SSE-style traffic, or anything latency-sensitive where flushing behavior matters, compression may introduce buffering or delivery characteristics that work against the protocol. In those cases, the question is not just "can I compress this?" but "does compression preserve the delivery behaviour I actually want?" For real-time or progressive-delivery endpoints, the right answer is often endpoint-specific rather than global.</p>
<p>The security and resilience side should not be ignored either. Kestrel exposes minimum request and response data rate limits, and the documented defaults are 240 bytes per second with a 5 second grace period. That matters because slow clients, large request bodies, and decompression work can combine into unpleasant failure modes if you do not have sane guardrails. Kestrel also supports request header timeouts and request body size limits, and those settings should be part of your overall posture when you accept uploads or large bodies, compressed or otherwise. Compression is not an isolated tuning knob. It sits inside your wider transport hardening model.</p>
<p>Here is a fuller example with Kestrel limits and explicit compression setup:</p>
<pre><code class="language-csharp">using Microsoft.AspNetCore.ResponseCompression;
using System.IO.Compression;

var builder = WebApplication.CreateBuilder(args);

builder.WebHost.ConfigureKestrel(options =&gt;
{
    options.Limits.RequestHeadersTimeout = TimeSpan.FromSeconds(30);
    options.Limits.MinRequestBodyDataRate = new(bytesPerSecond: 240, gracePeriod: TimeSpan.FromSeconds(5));
    options.Limits.MinResponseDataRate = new(bytesPerSecond: 240, gracePeriod: TimeSpan.FromSeconds(5));
    options.Limits.MaxRequestBodySize = 20 * 1024 * 1024; // 20 MB
});

builder.Services.AddResponseCompression(options =&gt;
{
    options.EnableForHttps = true;
    options.Providers.Add&lt;BrotliCompressionProvider&gt;();
    options.Providers.Add&lt;GzipCompressionProvider&gt;();

    options.MimeTypes = ResponseCompressionDefaults.MimeTypes.Concat(new[]
    {
        "application/json",
        "application/problem+json"
    });
});

builder.Services.Configure&lt;BrotliCompressionProviderOptions&gt;(options =&gt;
{
    options.Level = CompressionLevel.Fastest;
});

builder.Services.Configure&lt;GzipCompressionProviderOptions&gt;(options =&gt;
{
    options.Level = CompressionLevel.Fastest;
});

var app = builder.Build();

app.UseResponseCompression();

app.MapGet("/api/report", () =&gt;
{
    var report = Enumerable.Range(1, 10_000).Select(i =&gt; new
    {
        Id = i,
        Name = $"Item {i}",
        Status = i % 3 == 0 ? "Pending" : "Complete",
        Timestamp = DateTime.UtcNow.AddMinutes(-i)
    });

    return Results.Json(report);
});

app.Run();
</code></pre>
<p>That is the kind of configuration that belongs in a serious service. It does not just say "turn compression on." It defines the transport assumptions that go with it.</p>
<p>There is also a design lesson here for internal APIs and service-to-service calls. Developers sometimes assume compression matters only for internet-facing traffic. That is not always true. In cloud environments, especially across regions, VNets, or heavily loaded east-west traffic paths, payload size still matters. Compressing large JSON documents between services can reduce network cost and improve throughput. The catch is that the CPU trade-off now happens on your own estate at scale. If a service is already CPU-bound, compression can make it worse. If the network is the bottleneck, compression can help a lot. Again, this is why you benchmark real workloads instead of arguing from instinct.</p>
<p>If you want a clean set of rules that hold up in practice, they are these. Shape payloads properly first. Compress text-heavy responses by default. Prefer Brotli with gzip fallback. Leave already-compressed binaries alone. Use request decompression only for endpoints that genuinely need it. Prefer edge or proxy compression when your hosting stack supports it well. Treat HTTPS compression as a conscious security decision, not a default checkbox. Add limits and timeouts when you accept large request bodies. Measure the CPU and latency profile under realistic traffic before you call the job done. Those rules are not glamorous, but they are what separate a neat code sample from a production-grade API.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/b4a6317d-794c-47cd-b7bf-7f04de3d4702.png" alt="" style="display:block;margin:0 auto" />

<p>The big point is simple. Payload compression in ASP.NET Core is not a trick. It is part of transport engineering. When you treat it that way, the implementation becomes clearer. You stop asking whether you should "turn on gzip" and start asking the questions that actually matter: where should compression happen, which representations benefit, what security caveats apply, what limits protect the server, and whether your payloads deserved to be that large in the first place.</p>
<p>That's what serious API payload compression looks like in modern .NET. It's not complicated, but it does require intent.</p>
<p><a href="https://learn.microsoft.com/en-us/aspnet/core/performance/response-compression?view=aspnetcore-10.0&amp;utm_source=chatgpt.com">https://learn.microsoft.com/en-us/aspnet/core/performance/response-compression?view=aspnetcore-10.0&amp;utm_source=chatgpt.com</a></p>
<p><a href="https://learn.microsoft.com/en-us/aspnet/core/fundamentals/middleware/request-decompression?view=aspnetcore-10.0&amp;utm_source=chatgpt.com">https://learn.microsoft.com/en-us/aspnet/core/fundamentals/middleware/request-decompression?view=aspnetcore-10.0&amp;utm_source=chatgpt.com</a></p>
]]></content:encoded></item><item><title><![CDATA[Patterns for Resilience and Integration at Scale]]></title><description><![CDATA[Modern distributed systems rarely fail because the core business logic is too hard. They fail because the edges are messy. One service is slow, another is flaky, a third is legacy, a fourth is owned b]]></description><link>https://fullstackcity.com/patterns-for-resilience-and-integration-at-scale</link><guid isPermaLink="true">https://fullstackcity.com/patterns-for-resilience-and-integration-at-scale</guid><category><![CDATA[software development]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[design patterns]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[C#]]></category><category><![CDATA[serverless]]></category><category><![CDATA[distributed system]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Tue, 17 Mar 2026 18:19:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/0b5a8e18-c4a7-488f-bf3c-e8a28b2c9c58.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Modern distributed systems rarely fail because the core business logic is too hard. They fail because the edges are messy. One service is slow, another is flaky, a third is legacy, a fourth is owned by another team, and a fifth needs a human to click an approval link before anything can continue. The logic inside your own codebase might be clean and deterministic, but the moment a workflow starts crossing boundaries, certainty disappears. That is where resilience stops being a nice architectural word and starts becoming the difference between a system that degrades gracefully and one that creates operational chaos.</p>
<p>This is the part many Developers underestimate. They can design a clean domain model, expose a tidy API, and even get the happy path flowing nicely in development. Then integration begins. Payments have retry semantics you do not control. Fraud services throttle under burst load. ERP platforms respond eventually, but only after translating your request into formats that feel like they were invented in another decade. Humans approve things late, suppliers call back twice, webhooks arrive out of order, and support teams need answers while the workflow is still in flight. None of those problems are unusual. They are the normal operating environment of serious enterprise systems.</p>
<p>That reality creates what is best described as an integration tax. Every dependency adds latency, risk, state mismatch, and behavioural quirks. Every new handoff expands the number of ways a process can stall or become inconsistent. This tax cannot be avoided. If your system has to interact with payment providers, CRM tools, ERP platforms, shipping carriers, external risk engines, old databases, partner APIs, or human approvers, then complexity is already part of the deal. The real question is whether that complexity is handled intentionally or left to leak through the architecture.</p>
<p>The good news is that the same failure shapes show up again and again. Systems struggle with overload, duplicate work, half-completed transactions, tight coupling, invisible state, and awkward coexistence between new platforms and old ones. Once you see those patterns clearly, the architecture becomes much easier to reason about. Durable orchestration platforms such as Azure Durable Functions are especially useful here because they provide a strong set of building blocks for stateful workflows, retries, timers, external events, and long-running coordination. But the bigger lesson is not tied to one platform. The patterns in this article apply whether you are orchestrating with Durable Functions, Temporal, Step Functions, Camunda, MassTransit sagas, or even a carefully designed internal workflow engine.</p>
<p>This article takes those recurring resilience and integration problems and turns them into a practical operating model. We will look at circuit breakers, idempotency, compensation, event-driven handoffs, workflow status, hybrid architecture, and resilience-first design. The goal is not to repeat a chapter from a book. The goal is to turn those ideas into a standalone guide for engineers and architects building systems that need to survive the real world.</p>
<h2>Why Integration Gets Harder at Scale</h2>
<p>A single integration in a low-volume system is often manageable with little more than an HTTP client, a timeout, and a retry policy. That is why many systems look fine during the first release. The real trouble appears later, once transaction volume grows, external dependencies multiply, and the business starts relying on workflows that stretch across multiple bounded contexts.</p>
<p>At that point, latency stops being an isolated technical concern and starts shaping business outcomes. A fraud service that takes three seconds instead of two might not sound catastrophic, but if that call sits in the middle of a checkout flow, the extra second now becomes customer friction. Multiply that by retries, duplicate callbacks, rate limiting, and a few downstream dependencies, and what looked like a simple workflow becomes a slow-motion queueing problem. Enterprise systems rarely collapse in one dramatic moment. More often, they drown gradually in coordination overhead.</p>
<p>Another issue is failure diversity. Internal services often fail in relatively predictable ways because the same teams own the deployment model, monitoring stack, and operational practices. External systems are different. One dependency might fail fast with clear error codes. Another might hang without responding. Another might accept the request but finish it later. Another might partially succeed and provide no clean rollback. Legacy platforms are especially problematic because they often expose interfaces that were never designed for modern reliability expectations, yet still sit on the critical path of important business processes.</p>
<p>Human interaction adds another layer of uncertainty. Approvals, escalations, document review, manual intervention, and exception handling all introduce variable time windows that cannot be compressed by throwing more CPU at the problem. A workflow might be technically healthy but still paused for six hours waiting on someone in a different department. If the system does not model that state explicitly, operators end up guessing whether it is broken or simply waiting.</p>
<p>This is why mature integration architecture is less about making every dependency perfect and more about building a workflow that can absorb imperfect behaviour. You are not designing for a world where all systems are reliable. You are designing for a world where some systems are slow, some are inconsistent, some are overloaded, and some are still useful enough that the business cannot function without them.</p>
<h2>The Core Idea: Resilience Is a Workflow Concern</h2>
<p>Developers think about resilience at the level of individual service calls. They add retries to HTTP clients, configure exponential backoff, maybe wrap a few dependencies in a circuit breaker, and consider the job largely done. That helps, but it is not enough. In distributed systems, resilience is rarely just a call-level concern. It is a workflow concern.</p>
<p>A payment retry is not just a payment retry. It is part of a broader transaction that may also reserve inventory, create an order record, notify a customer, update a loyalty profile, and send data into finance systems. A supplier callback is not just an inbound event. It affects which timer should be cancelled, what status should be shown to support, and whether the workflow can proceed to the next stage. A human approval is not just a pause in processing. It changes how you monitor state, set expectations, and decide when intervention is needed.</p>
<p>This is where orchestration platforms earn their keep. They provide a durable memory of the workflow so that retries, waiting, state transitions, and external signals are modelled as first-class behaviour instead of being spread across controller methods, background jobs, and database flags. That durable state is not just useful for implementation. It also creates a place where resilience patterns can be applied consistently.</p>
<p>The rest of this article focuses on those patterns.</p>
<h2>Pattern 1: Circuit Breakers Prevent a Bad Dependency from Taking the Workflow Down with It</h2>
<p>One of the most common mistakes in integration-heavy systems is treating every failure as a reason to retry harder. That instinct is understandable. Retries solve a lot of transient faults, especially network blips, brief throttling, and short-lived platform issues. The problem is that retries are not free. When a downstream service is genuinely unhealthy, repeated retries can amplify the damage by increasing traffic against a struggling dependency and consuming resources in your own system while little useful work gets done.</p>
<p>That is why circuit breakers matter. A circuit breaker watches failure behaviour over time. If failures cross a threshold, the breaker opens and temporarily blocks new requests to the dependency. Rather than continuing to hammer a service that is already in trouble, the workflow fails fast or routes into a fallback path. After a cooldown period, the breaker can move into a half-open state and allow limited traffic to test whether the downstream system has recovered.</p>
<p>In a long-running workflow, this pattern is especially valuable because it prevents an unhealthy external service from dragging large volumes of orchestration instances into pointless retry loops. Imagine an order pipeline that calls an external fraud scoring API before taking payment. If that provider is returning 500 errors for ten minutes, the wrong response is to let every new order attempt the call repeatedly until the orchestration backlog expands and customer-facing latency spikes. A better response is to trip the breaker, fail new attempts quickly with a clear status, and alert operators that the fraud dependency is down.</p>
<p>A simplified view looks like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/e2a5f789-862d-4b26-ac02-10e6fe10a706.png" alt="" style="display:block;margin:0 auto" />

<p>In Durable Functions, one practical implementation is to use a Durable Entity to hold breaker state for a dependency. The entity can track consecutive failures, the time the breaker opened, and whether a call is allowed. Each orchestration or activity checks the entity before making the dependency call. That gives you a central, durable place to enforce the breaker rather than leaving each workflow instance to make its own isolated decision.</p>
<p>A stripped-back example might look like this in C#:</p>
<pre><code class="language-csharp">public record CircuitBreakerState(
    int ConsecutiveFailures,
    DateTime? OpenedAtUtc,
    bool IsOpen);

public class FraudServiceBreakerEntity
{
        public CircuitBreakerState State { get; set; } = new(0, null, false);
        public bool CanExecute(DateTime nowUtc)
        {
            if (!State.IsOpen)
                return true; 

            var cooldown = TimeSpan.FromMinutes(2);

            return State.OpenedAtUtc is { } openedAt &amp;&amp; nowUtc - openedAt &gt;= cooldown;

         }

        public void RecordSuccess()
        {
            State = new CircuitBreakerState(0, null, false);
        }

        public void RecordFailure(DateTime nowUtc)
        {
            var failures = State.ConsecutiveFailures + 1;
            if (failures &gt;= 5)
            {
                State = new CircuitBreakerState(failures, nowUtc, true);
                return;
            }

        State = new CircuitBreakerState(failures, State.OpenedAtUtc, false);

    }

}
</code></pre>
<p>The important part is not the code. It is the operational behaviour. Once the breaker opens, you stop turning a bad dependency into a system-wide slowdown. You make the failure explicit, measurable, and bounded.</p>
<p>That said, circuit breakers are not magic. They must be tuned carefully. Thresholds that are too aggressive can block useful traffic. Cooldowns that are too long can delay recovery. Breakers also need observability. If the team cannot see when they open, why they opened, and how often they are being exercised, they become another hidden state machine nobody trusts during an incident.</p>
<h2>Pattern 2: Idempotency Turns Retries from a Risk into a Safety Net</h2>
<p>If you work on distributed systems long enough, you stop asking whether duplicate requests will happen and start asking where they will happen first. Retries from clients, retries from orchestrators, webhook replays, queue redelivery, supplier callbacks, double clicks from users, and timeouts that hide already-completed work all create duplicate execution paths. If your system is not designed for that, it will eventually perform the same side effect twice.</p>
<p>That is where idempotency becomes non-negotiable. An idempotent operation can be executed multiple times with the same logical input and still produce the same final outcome. This does not mean every call is naturally idempotent. It means the system is built so that repeated attempts are recognised and handled safely.</p>
<p>Payment flows are the classic example. If a payment service receives the same charge request twice because the first response timed out, the customer must not be charged twice. The standard approach is to send an idempotency key, often the order ID or payment request ID, with the outbound call. The payment provider stores the first result for that key and returns the same outcome for later retries instead of executing a second charge.</p>
<p>But idempotency belongs far beyond payments. ERP submission endpoints should reject duplicate order registration for the same business reference. Customer reward updates should not apply points twice. Shipping requests should not create duplicate consignments. Inventory allocation should not reserve the same units repeatedly because a callback was delivered more than once.</p>
<p>Here is the shape of the idea:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/b03cac35-d1a2-4284-9b38-85afe34e63ed.png" alt="" style="display:block;margin:0 auto" />

<p>In your own services, the idempotency mechanism often comes down to a durable write model. You persist a unique business operation key before or alongside the side effect, and later requests with the same key return the stored outcome. Sometimes that means a dedicated idempotency table. Sometimes it means a natural domain guard such as a unique constraint on an external reference. Sometimes it means tracking processed event IDs in an entity or aggregate.</p>
<p>A simple service-side pattern in C# could look like this:</p>
<pre><code class="language-csharp">public sealed class ProcessedRequest
{
    public string RequestId { get; init; } = default!;
    public string ResultJson { get; init; } = default!;
    public DateTime ProcessedAtUtc { get; init; }
}

public async Task&lt;PaymentResult&gt; ChargeAsync(
    string requestId, decimal amount, cancellationToken stopToken)
{
    var existing = await db.ProcessedRequests
 .SingleOrDefaultAsync(x =&gt; x.RequestId == requestId, stopToken);

    if (existing is not null)
        return JsonSerializer.Deserialize&lt;PaymentResult&gt;(existing.ResultJson)!;

    var result = await gateway.ChargeAsync(amount, stopToken);

    db.ProcessedRequests.Add(new ProcessedRequest
    {
        RequestId = requestId,
        ResultJson = JsonSerializer.Serialize(result),
        ProcessedAtUtc = DateTime.UtcNow
    });

    await db.SaveChangesAsync(stopToken);
    return result;
}
</code></pre>
<p>The hard part is deciding the correct scope of the idempotency key. If it is too broad, distinct operations can accidentally collapse into one. If it is too narrow, duplicates slip through. Good idempotency design requires a clear understanding of the business operation, not just the transport request.</p>
<p>It is also worth being blunt about this: retries without idempotency are reckless. They create the appearance of resilience while quietly shifting the cost onto customers, finance teams, and support operations. Once you understand that, idempotency stops feeling like a technical detail and starts feeling like table stakes.</p>
<h2>Pattern 3: Compensation Is How You Survive Without Distributed Transactions</h2>
<p>Enterprise workflows almost always cross boundaries where a single atomic transaction is impossible. You might charge a card in one system, reserve inventory in another, create a shipment in a third, and register the order in an ERP platform that still thinks SOAP is modern. No transaction coordinator is going to make all of that commit or roll back as one neat unit. Even if it could, you probably would not want the coupling and latency that came with it.</p>
<p>So what happens when part of the workflow succeeds and a later step fails? That is where compensation comes in. Compensation is the deliberate reversal of already-completed actions so that the broader workflow returns to a consistent business state.</p>
<p>Suppose a checkout flow successfully charges the customer, then later fails to allocate stock. Without compensation, the system has taken money for an order it cannot fulfil. That is not a mere technical defect. It is a business failure. The workflow needs a compensating action, such as issuing a refund, releasing provisional customer benefits, and notifying operations if manual review is required.</p>
<p>The same applies in other domains. If a claims workflow opens a financial reserve and later discovers a validation failure, the reserve may need reversing. If an onboarding workflow provisions downstream access and then fails a compliance check, those accounts may need disabling. If a shipping request is accepted and the ERP later rejects the order, logistics and customer communication may both need corrective action.</p>
<p>A compensation flow often looks like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/de090a5f-df4b-4851-af82-77c70acb9e8e.png" alt="" style="display:block;margin:0 auto" />

<p>Compensation is frequently misunderstood as just calling an undo API. Sometimes that is possible, but often it is not. Real compensations can be asynchronous, partial, or manual. A refund might take time. A shipment might be cancellable only before handoff to the carrier. A legacy platform might support reversal only through an overnight batch. That means compensation needs its own design, status tracking, and operational visibility.</p>
<p>In orchestrated systems, a common pattern is to record which forward steps have completed, then execute compensations in reverse order if the workflow later fails. Durable Functions makes this practical because orchestration state can keep track of what has happened so far.</p>
<p>A simplified orchestration sketch might look like this:</p>
<pre><code class="language-csharp">[Function(nameof(ProcessOrderOrchestrator))]

public static async Task Run(
 [OrchestrationTrigger] TaskOrchestrationContext context)
{
    var order = context.GetInput&lt;OrderRequest&gt;();
    var completedSteps = new List&lt;string&gt;();

    try
    {
        await context.CallActivityAsync(nameof(ReserveInventoryActivity), order);

        completedSteps.Add("inventory");
        await context.CallActivityAsync(nameof(ChargePaymentActivity), order);

        completedSteps.Add("payment");
        await context.CallActivityAsync(nameof(RegisterOrderInErpActivity), order);

        completedSteps.Add("erp");
    }
    catch (Exception)
    {
        if (completedSteps.Contains("payment"))
        {
            await context.CallActivityAsync(nameof(RefundPaymentActivity), order);
        }

        if (completedSteps.Contains("inventory"))
        {
            await context.CallActivityAsync(nameof(ReleaseInventoryActivity), order);
        }

        await context.CallActivityAsync(nameof(RaiseOpsAlertActivity), order.OrderId);

        throw;
     }
}
</code></pre>
<p>This is deliberately simple, but it makes the main point. Compensation is not an optional extra you add later. It is part of the workflow contract. If a business process can leave the world half-changed, then it also needs a defined path to recover from that condition.</p>
<p>There is another important truth here. Compensation is rarely perfect. You should not promise exact rollback semantics where the domain does not support them. Some workflows are better described as eventually corrected rather than fully undone. That is fine, provided the state transitions are explicit and visible. False certainty is more dangerous than honest eventual consistency.</p>
<h2>Pattern 4: Event-Driven Integration Reduces Coupling and Preserves Flow</h2>
<p>One of the easiest ways to make orchestration brittle is to let the central workflow call every downstream system directly. It feels simple at first because all the logic is in one place. The orchestrator confirms the order, then calls the ERP, then calls analytics, then calls CRM, then calls some downstream fulfilment component, then maybe calls a notification service. The problem is that each of those direct calls adds latency and dependency pressure to the core flow.</p>
<p>A better option in many cases is to separate the business milestone from the downstream reactions. Once the workflow reaches a meaningful state, such as order confirmed, claim submitted, policy approved, or onboarding completed, it can publish an event. Other systems subscribe independently and handle their own processing. That removes non-essential side effects from the critical path and reduces direct coupling between the orchestrator and every consumer.</p>
<p>Here is the contrast:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/ddb9c2ba-be57-460e-8b21-d166eb4c20e2.png" alt="" style="display:block;margin:0 auto" />

<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/73830a2d-a58c-431c-90b4-836b8f2bddee.png" alt="" style="display:block;margin:0 auto" />

<p>This shift matters for several reasons. First, it shortens the synchronous path of the core workflow. Second, it allows new consumers to be added later without modifying the orchestrator. Third, it isolates failure. If analytics is down, that should not usually block order confirmation. If CRM processing is delayed, the business milestone may still be valid.</p>
<p>That does not mean direct calls disappear entirely. Some steps remain essential to the transaction outcome and must stay in the workflow. Payment authorisation is usually not optional. Inventory reservation is often not optional. But secondary reactions are usually better handled as event subscribers.</p>
<p>In Azure, this might mean an orchestration step publishes an <code>OrderConfirmed</code> event into Event Grid or a queue topic after core invariants are satisfied. Separate Functions then react to that event and perform ERP synchronisation, customer communications, and reporting updates. In other stacks, the same pattern could use Kafka, RabbitMQ, SNS/SQS, NATS, or any eventing platform with durable delivery.</p>
<p>A typical event contract should be boring and explicit. That is a good thing. It might include a business ID, event type, timestamp, correlation ID, schema version, and only the data consumers genuinely need. Resist the urge to publish an anemic dump of internal objects. Events are integration contracts, not convenient serialisation shortcuts.</p>
<p>A simple event model could look like this:</p>
<pre><code class="language-csharp">public sealed record OrderConfirmedEvent(
    string OrderId,
    string CustomerId,
    decimal Total,
    DateTime ConfirmedAtUtc,
    string CorrelationId,
    int SchemaVersion);
</code></pre>
<p>There is a trade-off, of course. Event-driven systems push you toward eventual consistency. Consumers may process at different times. Delivery may be at least once, not exactly once. That takes us right back to idempotency and observability. Event-driven integration works well when paired with those patterns, not when treated as a shortcut that somehow removes the need for them.</p>
<h2>Pattern 5: Custom Status and Observability Keep Workflows from Becoming Black Boxes</h2>
<p>Many operational incidents are not caused by the workflow being broken. They are caused by nobody being able to tell what the workflow is doing. A long-running integration process can be perfectly healthy while waiting on a supplier response, a human approval, or an overnight ERP batch. Without good status signals, support teams often interpret waiting as failure and failure as waiting. That confusion creates noise, escalations, and manual work that should never have existed.</p>
<p>The fix is simple in principle and often neglected in practice. Long-running workflows need explicit, queryable status. Not vague technical state, but business-meaningful status. A fraud check should not just be running. It should be <code>FraudCheckPending</code> or <code>FraudCheckFailed</code>. An ERP handoff should be <code>ErpSubmissionPending</code>, <code>ErpRegistered</code>, or <code>ErpRejected</code>. A supplier callback stage should be <code>AwaitingSupplierApproval</code>. A manual review should be <code>PendingHumanDecision</code>.</p>
<p>Durable Functions supports custom orchestration status, which is a powerful way to surface this information directly from the workflow runtime. But the same idea applies on any platform. You need a state model that answers the basic operational question: where is this process now, and why is it there?</p>
<p>A practical lifecycle might look like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/48055546-64cf-4c6b-91e7-ba6c317f1c9c.png" alt="" style="display:block;margin:0 auto" />

<p>In code, that might be as straightforward as setting custom status at each meaningful stage:</p>
<pre><code class="language-csharp">context.SetCustomStatus(new
    {
        orderId = order.OrderId,
        stage = "FraudCheckPending",
        updatedAtUtc = context.CurrentUtcDateTime
    });
</code></pre>
<p>That single line is more valuable than many teams realise. Once status is queryable, you can power dashboards, operator portals, support tooling, and incident triage without reverse engineering workflow behaviour from logs.</p>
<p>Observability also needs more than status labels. Correlation IDs must flow through the entire chain, from inbound request to orchestration instance to activity calls to outbound dependency calls and published events. Logs need consistent structured fields. Metrics should cover latency, retries, breaker state, queue depth, timeout counts, compensation frequency, and downstream failure rates. Tracing should allow engineers to follow a transaction through multiple services without playing archaeology across disconnected log stores.</p>
<p>Here is the ugly truth. If your workflow depends on several systems and you do not have proper correlation and state visibility, you do not have an operable architecture. You have a hope-based architecture.</p>
<h2>Pattern 6: Hybrid Integration Accepts Reality Instead of Demanding a Rewrite</h2>
<p>A lot of technical content on serverless and orchestration quietly assumes the organisation has the freedom to build a clean greenfield system. That is rarely how enterprise work actually looks. Most teams are not replacing everything. They are inserting modern capability into an environment where a mixture of old and new already exists.</p>
<p>That is why hybrid integration matters. Serverless does not have to replace the ERP. It can orchestrate around it. Durable workflows do not have to own every business rule. They can coordinate specialised services that already exist. Modern data stores can support fast projections and reporting while a different platform remains the canonical system of record. New cloud-native capabilities can coexist with legacy systems provided the architectural boundaries are clear.</p>
<p>A realistic enterprise shape often looks something like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/75e0a41e-3352-4be8-8197-0c3274204f5e.png" alt="" style="display:block;margin:0 auto" />

<p>This hybrid model is often the most pragmatic route to value. The orchestration layer becomes the coordinator of the business process. Compliance-sensitive payment logic can remain in a dedicated service. A legacy ERP can continue as the source of truth for certain financial or operational records. Cloud-native projections can power responsive read models and dashboards without forcing the organisation to migrate everything at once.</p>
<p>That also means architects need discipline around ownership. The orchestration engine should coordinate process state, not become the dumping ground for every piece of business logic in the company. The ERP should retain the responsibilities it is still good at, not be called for every trivial lookup. Projection stores should serve read performance and user experience, not quietly evolve into shadow systems with ambiguous truth boundaries.</p>
<p>The big win in hybrid architecture is incremental progress. You do not need a grand rewrite to improve resilience, observability, and flow control. You can wrap brittle integrations with better orchestration. You can isolate long-running handoffs. You can publish cleaner events. You can add compensations and workflow visibility around systems that were never built with those ideas in mind.</p>
<p>That is usually how real transformation succeeds, not through replacement fantasies but through carefully chosen seams.</p>
<h2>Pattern 7: Resilience by Design Means Assuming the System Will Be Incomplete, Slow, and Wrong Sometimes</h2>
<p>The strongest systems are not the ones that assume everything will go right. They are the ones that assume at least some parts will go wrong and still define how the workflow should behave. That mindset is what resilience by design really means.</p>
<p>It means assuming partial failure is normal. A dependency might succeed after a retry, fail permanently, or accept work and complete later. A callback might arrive twice. A timer might expire before a human responds. An event consumer might process late. An external system might hold the canonical answer even though your local projection says otherwise. These are not edge cases. They are part of the design space.</p>
<p>Resilience by design also means being honest about consistency. Many distributed workflows are eventually consistent, and pretending otherwise helps nobody. The real architectural task is to define where temporary inconsistency is acceptable, how it is reconciled, and what the user or operator sees while it exists. Good systems make the transition states explicit instead of hiding them behind vague processing messages.</p>
<p>It also means measuring the behaviour that matters. You should know which dependencies are slowest, which steps retry most often, which compensations are frequent, how long workflows remain in waiting states, and which manual interventions are recurring. Teams that do not measure this tend to rediscover the same operational pain every quarter and act surprised each time.</p>
<p>Finally, resilience by design means accepting that supportability is part of architecture. A workflow is not finished when it compiles and passes tests. It is finished when operators can understand it, support teams can explain it, incidents can be triaged quickly, and business stakeholders can trust that failures are bounded and recoverable.</p>
<h2>A Concrete End-to-End Example</h2>
<p>Let us pull these patterns together in a single scenario. Imagine a large B2B order workflow. An order enters the system through an API. The orchestration starts and immediately assigns a correlation ID that follows the transaction everywhere. The workflow sets its status to <code>Received</code>. It then checks whether the fraud provider breaker is open. If it is, the workflow fails fast with a visible dependency-unavailable status rather than quietly piling into retries.</p>
<p>If the breaker allows execution, the workflow sends a fraud request with a request ID that can be used for deduplication if the provider supports it. Once fraud is approved, payment is attempted with an idempotency key derived from the order ID. That ensures retries cannot double-charge the customer. After payment succeeds, the workflow publishes an <code>OrderConfirmed</code> event so downstream analytics and CRM updates can proceed independently instead of extending the critical path.</p>
<p>Next, the workflow submits the order to a legacy ERP. The ERP is slow and sometimes responds asynchronously, so the orchestration switches status to <code>ErpSubmissionPending</code> and waits for either an external callback or a timeout. If the callback arrives with success, the workflow completes. If the ERP rejects the order, the orchestration enters <code>CompensationInProgress</code>, triggers a refund, releases any provisional inventory state, raises an operational alert, and finally moves the order into a failed terminal state with a reason that support can actually understand.</p>
<p>That end-to-end shape looks like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/91a411c9-8198-48eb-b360-7880110071d5.png" alt="" style="display:block;margin:0 auto" />

<p>Nothing in that flow is exotic. That is exactly the point. Most resilient architectures are not built from obscure theory. They are built from boring patterns applied consistently and early enough that the system does not rot under growth.</p>
<h2>What Developers Usually Get Wrong</h2>
<p>The first common mistake is over-centralising the orchestration. Developers discover a workflow engine and start putting every rule, integration, and transformation into the orchestrator itself. That turns the orchestrator into a giant god-process that becomes hard to change and impossible to reason about. The workflow should coordinate. It should not absorb every responsibility.</p>
<p>The second mistake is believing retries are a resilience strategy on their own. They are not. Retries without idempotency, compensation, status visibility, and bounded dependency behaviour are just a way of failing repeatedly.</p>
<p>The third mistake is underestimating operational visibility. Teams often spend far more time designing the happy path than designing the support path. Then the first real incident happens and nobody can answer the obvious questions. Which stage is this order at. Did payment happen already. Has ERP seen it. Is this waiting for a callback or stuck in a retry loop. Those questions should not require an engineer to grep logs across five systems.</p>
<p>The fourth mistake is assuming greenfield purity is required before improvement is possible. It is not. Some of the best resilience gains come from putting orchestration, status modelling, idempotency, and compensations around existing systems rather than replacing them.</p>
<p>The fifth mistake is treating eventual consistency as a flaw to be hidden instead of a reality to be designed for. Users and operators can cope with transition states if those states are honest and understandable. What they cannot cope with is silent ambiguity.</p>
<h2>How to Apply These Patterns in Practice</h2>
<p>If you are building or modernising an integration-heavy workflow, start by identifying the true business milestones rather than the raw API calls. Ask where side effects happen, which ones must be synchronous, which ones can be event-driven, and which ones need compensation if a later step fails. That alone will usually reveal whether your current workflow is too tightly coupled.</p>
<p>Then look at duplicate execution risk. Anywhere you have retries, redelivery, callbacks, or human re-submission, you need a defined idempotency strategy. Be precise about the operation key and where the result is recorded. Vague assurances that the provider should handle duplicates are not enough. Next, inspect dependency behaviour. Which integrations deserve a circuit breaker. Which ones should fail fast. Which ones should shift into async wait mode with timers and external events. Which ones are important enough to stay on the critical path and which ones should react to events later.</p>
<p>After that, design your status model. Not your log messages, your status model. What are the meaningful states of the workflow from an operator and business perspective. How are those states exposed. Where do correlation IDs flow. What metrics would tell you this process is degrading before customers notice.</p>
<p>Finally, decide how the new workflow lives alongside existing systems. Be explicit about what remains the source of truth, what becomes a projection, and what the orchestrator does and does not own. Hybrid architecture becomes dangerous only when ownership is vague.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/5a099285-acef-4216-ab5d-6e5fe508412d.png" alt="" style="display:block;margin:0 auto" />

<p>Resilience at scale is not about making distributed systems behave like a single local transaction. That fantasy does not survive contact with real dependencies, real organisations, or real time. The job is to build workflows that remain understandable and recoverable when the surrounding systems behave imperfectly.</p>
<p>That is why these patterns are useful. Circuit breakers keep one bad dependency from turning into systemic slowdown. Idempotency makes retries safe. Compensation gives workflows a path back from partial success. Event-driven integration reduces unnecessary coupling. Custom status and observability make the process operable. Hybrid architecture accepts the systems you actually have. Resilience by design ties all of it together into a mindset rather than a patchwork of technical tricks.</p>
<p>Once you start thinking this way, integration architecture changes. You stop asking how to make the happy path pass one more test and start asking how the workflow behaves when the world around it is late, duplicated, unavailable, or inconsistent. That is the right question. It is also the one that separates systems that merely work from systems that keep working.</p>
]]></content:encoded></item><item><title><![CDATA[Communicating Between Modules in a Modular Monolith]]></title><description><![CDATA[Why Developers Get This Wrong
Most engineers learning modular monoliths fall into two traps. The first group collapses boundaries by sharing DbContexts, repositories, and entities across modules. The ]]></description><link>https://fullstackcity.com/communicating-between-modules-in-a-modular-monolith</link><guid isPermaLink="true">https://fullstackcity.com/communicating-between-modules-in-a-modular-monolith</guid><category><![CDATA[Modular Monolith]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[C#]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[.NET]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 12 Mar 2026 21:19:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/d439dff6-4c67-4e67-aeec-e75a35230890.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Why Developers Get This Wrong</h2>
<p>Most engineers learning modular monoliths fall into two traps. The first group collapses boundaries by sharing DbContexts, repositories, and entities across modules. The second group overcompensates by enforcing microservice-style communication within the monolith, introducing HTTP calls or message buses between modules. Both patterns undermine the purpose of a modular monolith. The correct approach maintains module isolation while enabling fast in-process communication through contracts and events. Vertical Slice architecture alters how we structure these modules.</p>
<h2>The Architecture We Are Targeting</h2>
<p>We are building a modular monolith with:</p>
<ul>
<li><p>Vertical Slice architecture</p>
</li>
<li><p>CQRS</p>
</li>
<li><p>Minimal APIs</p>
</li>
<li><p>Separate module databases</p>
</li>
<li><p>No HTTP between modules</p>
</li>
<li><p>No message bus inside the process</p>
</li>
</ul>
<p>Example modules:</p>
<ul>
<li><p>Users</p>
</li>
<li><p>Claims</p>
</li>
</ul>
<p>Each module owns its data and exposes capabilities, not services.</p>
<p>Instead of layers like Application, Domain, Infrastructure, the module is organised by features.</p>
<pre><code class="language-plaintext">src
 ├ Users
 │   ├ Contracts
 │   │   └ UserQueries.cs
 │   │
 │   ├ CreateUser
 │   │   ├ Endpoint.cs
 │   │   ├ Command.cs
 │   │   ├ Handler.cs
 │   │   └ Validator.cs
 │   │
 │   ├ GetUser
 │   │   ├ Query.cs
 │   │   └ Handler.cs
 │   │
 │   └ DeleteUser
 │       ├ Command.cs
 │       └ Handler.cs
 │
 └ Claims
     ├ Contracts
     │   └ ClaimQueries.cs
     │
     ├ CreateClaim
     │   ├ Endpoint.cs
     │   ├ Command.cs
     │   └ Handler.cs
     │
     └ ApproveClaim
         ├ Command.cs
         └ Handler.cs
</code></pre>
<p>Each folder is a slice.</p>
<p>The slice contains everything needed for that use case.</p>
<h2>Dependency Direction Between Modules</h2>
<p>Modules reference contracts only, never implementation slices.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/9627390f-a9d4-4c86-8841-ac040fdf142e.png" alt="" style="display:block;margin:0 auto" />

<p>This rule is the cornerstone that keeps a modular monolith genuinely modular instead of slowly degrading into a tangled codebase. The Claims module is allowed to reference Users.Contracts because contracts represent the public capability surface of the Users module. They define what the Users module is willing to expose to the rest of the system in a controlled, stable way. Contracts typically contain simple request objects, response DTOs, and interfaces that describe operations such as queries or commands. Importantly, they contain no business logic, persistence concerns, or internal implementation details. By depending only on this contract layer, the Claims module interacts with the Users module in the same way an external client would, through clearly defined capabilities rather than through direct knowledge of how the module works internally. What the Claims module must never do is reference internal slices like Users.CreateUser, Users.GetUser, or Users.DeleteUser, because those folders represent implementation details of individual vertical slices inside the Users module. Allowing other modules to depend on those slices would immediately couple them to internal design decisions, meaning a simple refactor of a handler or slice could ripple across the entire system. The same rule applies even more strongly to infrastructure concerns such as UsersDbContext. Once another module starts using a different module’s DbContext, it is effectively reaching directly into that module’s database and bypassing its business rules, validations, and invariants. That instantly destroys the boundary between modules and turns the architecture into a shared persistence layer disguised as modules. By restricting dependencies strictly to Users.Contracts, the Users module remains free to reorganise its slices, change its database schema, refactor its handlers, or even split into a separate service later without breaking other modules. The Claims module only knows what operations are available, not how they are implemented, which is exactly the level of coupling a modular monolith is designed to enforce.The Three Communication Mechanisms</p>
<p>Vertical slice modular monoliths usually communicate through:</p>
<ol>
<li><p>Query contracts</p>
</li>
<li><p>Command contracts</p>
</li>
<li><p>Domain events</p>
</li>
</ol>
<p>These are not HTTP calls and not service bus messages. They are simple in-process calls.</p>
<h2>Pattern 1 - Query Contracts</h2>
<p>Suppose the CreateClaim slice needs to verify that the user associated with the claim actually exists and is allowed to submit a claim. At first glance it may seem natural for the Claims module to simply query the Users table directly, especially since everything runs inside the same application and the Users database is technically accessible. However, doing so would immediately violate the boundary between modules because the Claims module would now be coupled to the Users module’s persistence model and database schema. Any change to the Users table structure, indexes, or entity model could silently break the Claims module, and worse, the Claims module would be bypassing any business rules or invariants that the Users module is responsible for enforcing. In a modular monolith, each module owns its data and must be the only component allowed to access that data directly. Instead of reading the Users database, the Claims module should request the information it needs through a query contract exposed by the Users module. This contract defines a simple, explicit capability such as "retrieve a summary of a user by ID." The Claims module then calls that query through an interface defined in Users.Contracts, allowing the Users module to remain the sole authority over how user data is stored, retrieved, and validated. The Claims module gets exactly the information it needs to perform its operation, while the internal implementation of the Users module remains completely hidden behind the contract boundary.</p>
<h3>Users.Contracts</h3>
<pre><code class="language-csharp">public record GetUserSummaryQuery(Guid UserId);

public record UserSummaryDto(
    Guid Id,
    string Email,
    bool IsActive);

public interface IUserQueries
{
    Task&lt;UserSummaryDto?&gt; GetUserSummary(
        GetUserSummaryQuery query,
        CancellationToken stopToken);
}
</code></pre>
<p>The contract lives inside Users.Contracts.</p>
<p>No EF. No implementation.</p>
<h2>Implementing the Query Slice</h2>
<p>Inside the Users module.</p>
<pre><code class="language-plaintext">Users/GetUserSummary
</code></pre>
<pre><code class="language-csharp">internal sealed class Handler : IUserQueries
{
    private readonly UsersDbContext db;

    public Handler(UsersDbContext db)
    {
        this.db = db;
    }

    public async Task&lt;UserSummaryDto?&gt; GetUserSummary(
        GetUserSummaryQuery query,
        CancellationToken stopToken)
    {
        return await db.Users
            .Where(x =&gt; x.Id == query.UserId)
            .Select(x =&gt; new UserSummaryDto(
                x.Id,
                x.Email,
                x.IsActive))
            .FirstOrDefaultAsync(stopToken);
    }
}
</code></pre>
<p>Register the slice handler.</p>
<pre><code class="language-csharp">services.AddScoped&lt;IUserQueries, Handler&gt;();
</code></pre>
<hr />
<h2>Using the Query in a Claims Slice</h2>
<p>Inside the CreateClaim slice.</p>
<pre><code class="language-csharp">public sealed class Handler
{
    private readonly IUserQueries users;
    private readonly ClaimsDbContext db;

    public Handler(
        IUserQueries users,
        ClaimsDbContext db)
    {
        this.users = users;
        this.db = db;
    }

    public async Task&lt;Guid&gt; Handle(
        Command cmd,
        CancellationToken stopToken)
    {
        var user = await users.GetUserSummary(
            new GetUserSummaryQuery(cmd.UserId),
            stopToken);

        if (user is null)
            throw new Exception("User not found");

        var claim = Claim.Create(cmd.UserId);

        db.Claims.Add(claim);
        await db.SaveChangesAsync(stopToken);

        return claim.Id;
    }
}
</code></pre>
<p>The call remains fully in-process.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/6959c079-7736-4711-81be-3afa92b251a9.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Pattern 2 - Command Contracts</h2>
<p>Sometimes a module does not just need to read information from another module, it needs that module to actually perform some work on its behalf. A good example is when a claim is approved in the Claims module and the user should be notified that their claim has been accepted. It might be tempting for the Claims module to send an email directly or call some notification service itself, but that would be a mistake because notification behaviour belongs to the Users module’s responsibility. The Claims module should not need to know whether notifications are sent via email, SMS, push notification, or some future system that has not even been introduced yet. If Claims implements that logic, it becomes tightly coupled to infrastructure details that are outside its domain. Instead, the Claims module should simply express the intent of the action by issuing a command to the Users module through a contract. That command represents a capability such as "notify this user with this message." The Users module then decides how that notification is handled internally. By structuring the interaction this way, the Claims module remains focused purely on claims-related business logic while the Users module retains full ownership of notification behaviour and the infrastructure required to deliver it. This keeps responsibilities clearly separated and prevents implementation details from leaking across module boundaries.</p>
<h3>Users.Contracts</h3>
<pre><code class="language-csharp">public record NotifyUserCommand(
    Guid UserId,
    string Message);

public interface IUserCommands
{
    Task NotifyUser(
        NotifyUserCommand command,
        CancellationToken stopToken);
}
</code></pre>
<hr />
<h2>Users Slice Implementation</h2>
<pre><code class="language-plaintext">Users/NotifyUser
</code></pre>
<pre><code class="language-csharp">internal sealed class Handler : IUserCommands
{
    public Task NotifyUser(
        NotifyUserCommand command,
        CancellationToken stopToken)
    {
        // send email, SMS etc
        return Task.CompletedTask;
    }
}
</code></pre>
<hr />
<h2>Claims Slice Triggering the Command</h2>
<pre><code class="language-plaintext">Claims/ApproveClaim
</code></pre>
<pre><code class="language-csharp">public sealed class Handler
{
    private readonly IUserCommands users;

    public Handler(IUserCommands users)
    {
        this.users = users;
    }

    public async Task Handle(
        Command cmd,
        CancellationToken stopToken)
    {
        await users.NotifyUser(
            new NotifyUserCommand(
                cmd.UserId,
                "Claim approved"),
            stopToken);
    }
}
</code></pre>
<p>Again, no HTTP, no message broker.</p>
<hr />
<h2>Pattern 3 - Domain Events</h2>
<p>Commands are appropriate when one module explicitly knows that another module must perform a specific action. In those cases the calling module intentionally invokes a capability exposed by the other module through a contract. Events serve a different purpose. Events are used when a module should not know which other modules might care about something that has happened. Instead of directing another module to do something, the module simply announces that a significant domain event occurred. A good example is user deletion. When the Users module deletes a user, it should not contain logic that checks whether the Claims module exists or whether it needs to clean up claims data. That would tightly couple the Users module to the rest of the system and force it to understand responsibilities that belong to other domains. Instead, the Users module publishes a UserDeleted event indicating that the user has been removed. Other modules that care about that event can react independently. The Claims module might close open claims for that user, an auditing module might archive historical data, and a reporting module might update statistics. None of those reactions are the Users module’s responsibility. By publishing an event rather than issuing direct commands, the Users module remains completely unaware of which modules subscribe to that event, preserving loose coupling and allowing new behavior to be added later without modifying the Users module itself.</p>
<pre><code class="language-csharp">public record UserDeletedEvent(Guid UserId);
</code></pre>
<hr />
<h2>Event Flow</h2>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/8cc028e9-33e1-4ee4-9b69-eaa963d1ebe0.png" alt="" style="display:block;margin:0 auto" />

<p>Users publishes:</p>
<pre><code class="language-csharp">await dispatcher.Publish(
    new UserDeletedEvent(userId),
    stopToken);
</code></pre>
<p>Claims reacts:</p>
<pre><code class="language-csharp">public sealed class Handler
{
    private readonly ClaimsDbContext db;

    public async Task Handle(
        UserDeletedEvent evt,
        CancellationToken stopToken)
    {
        var claims = await db.Claims
            .Where(x =&gt; x.UserId == evt.UserId)
            .ToListAsync(stopToken);

        foreach (var claim in claims)
        {
            claim.MarkUserDeleted();
        }

        await db.SaveChangesAsync(stopToken);
    }
}
</code></pre>
<p>Users never references Claims.</p>
<h2>In-Process Event Dispatcher</h2>
<p>Because everything runs in one process, the dispatcher is trivial.</p>
<pre><code class="language-csharp">public class EventDispatcher
{
    private readonly IServiceProvider services;

    public EventDispatcher(IServiceProvider services)
    {
        this.services = services;
    }

    public async Task Publish&lt;T&gt;(
        T domainEvent,
        CancellationToken stopToken)
    {
        var handlers =
            services.GetServices&lt;IEventHandler&lt;T&gt;&gt;();

        foreach (var handler in handlers)
        {
            await handler.Handle(domainEvent, stopToken);
        }
    }
}
</code></pre>
<hr />
<h2>Minimal API Integration</h2>
<p>Endpoints live inside the slice.</p>
<p>Example:</p>
<pre><code class="language-plaintext">Claims/CreateClaim/Endpoint.cs
</code></pre>
<pre><code class="language-csharp">app.MapPost("/claims",
    async (
        Command cmd,
        Handler handler,
        CancellationToken stopToken) =&gt;
{
    var id = await handler.Handle(cmd, stopToken);
    return Results.Ok(id);
});
</code></pre>
<p>The endpoint talks only to its slice handler.</p>
<h2>Why This Works</h2>
<p>This architecture preserves:</p>
<ul>
<li><p>strict module boundaries</p>
</li>
<li><p>independent databases</p>
</li>
<li><p>vertical slice isolation</p>
</li>
<li><p>extremely fast in-process calls</p>
</li>
</ul>
<p>The system remains loosely coupled because modules depend only on contracts.</p>
<p>Yet communication remains extremely simple.</p>
<h2>The Performance Advantage</h2>
<p>In-process contract calls are dramatically faster than external communication.</p>
<table>
<thead>
<tr>
<th>Communication</th>
<th>Typical latency</th>
</tr>
</thead>
<tbody><tr>
<td>HTTP</td>
<td>3–15 ms</td>
</tr>
<tr>
<td>Message bus</td>
<td>10–100 ms</td>
</tr>
<tr>
<td>In-process contract</td>
<td>&lt;0.1 ms</td>
</tr>
</tbody></table>
<p>For high-throughput systems, that difference matters.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67c36038c69a4b7143c5fc49/82637f69-6083-49c0-a46d-bbef48182deb.png" alt="" style="display:block;margin:0 auto" />

<p>A modular monolith using Vertical Slice + CQRS + Minimal APIs should not resemble either a layered monolith or a microservice system.</p>
<p>Slices contain the behaviour. Modules own the data. Contracts define the boundaries.</p>
<p>Queries read across modules. Commands trigger behaviour. Domain events propagate changes.</p>
<p>The result is an architecture that is simple, fast, and strongly modular without introducing the complexity of distributed systems.</p>
]]></content:encoded></item><item><title><![CDATA[Enforcing Architecture in .NET]]></title><description><![CDATA[Most software systems do not fail because developers cannot write code. They fail because you cannot control how that code evolves over time. Architecture starts clear, clean, and well-structured. Six]]></description><link>https://fullstackcity.com/enforcing-architecture-in-net</link><guid isPermaLink="true">https://fullstackcity.com/enforcing-architecture-in-net</guid><category><![CDATA[software architecture]]></category><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Sun, 22 Feb 2026 10:44:58 GMT</pubDate><enclosure url="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/67c36038c69a4b7143c5fc49/8cf278c5-2d43-4b7e-8516-f0100a0d5679.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most software systems do not fail because developers cannot write code. They fail because you cannot control how that code evolves over time. Architecture starts clear, clean, and well-structured. Six months later, boundaries blur. Dependencies creep inward. Domain logic leaks into infrastructure. Services begin calling each other directly. The carefully designed structure becomes a tangled web.</p>
<p>This happens because architecture is usually treated as documentation rather than enforcement. You define layering rules, module boundaries, and dependency directions, but those rules exist only as diagrams and wiki pages. They rely on discipline, code reviews, and team knowledge to maintain consistency. As you grow and deadlines tighten, these human safeguards become insufficient.</p>
<p>This is where architectural testing changes everything.</p>
<p><a href="https://github.com/BenMorris/NetArchTest">NetArchTest</a> is a lightweight yet powerful library that allows you to express architectural rules as executable tests. Instead of hoping developers respect boundaries, you verify them automatically. Instead of discovering violations during late reviews, you detect them immediately during CI builds. Instead of architecture drifting silently, you make it impossible for violations to enter the codebase unnoticed.</p>
<p>I briefly covered using this tool back in my <a href="https://fullstackcity.com/part-6-testing-strategy-for-modular-monoliths-beyond-unit-tests">Modular Monolith walkthrough</a> but decided it deserves its own post so this article explores how NetArchTest works, how to integrate it into a real .NET solution, and how to use it to enforce architectural discipline in modular monoliths, layered systems, and large enterprise applications.</p>
<h2>The Problem with Traditional Architectural Governance</h2>
<p>Before understanding NetArchTest, it is important to understand why architectural governance often fails.</p>
<p>You define architecture using documentation. Draw diagrams showing for example, layers such as Presentation, Application, Domain, and Infrastructure. They specify that dependencies must flow inward toward the domain. They state that certain modules must not reference each other.</p>
<p>The problem is that documentation does not enforce anything.</p>
<p>Developers under time pressure might introduce shortcuts. A controller may reference a repository directly. An infrastructure class might begin calling domain logic. A shared project might gradually accumulate unrelated responsibilities. These violations often seem small at first, but they compound over time.</p>
<p>Code reviews attempt to catch these issues, but reviewers focus primarily on correctness and functionality. Architectural violations are subtle and easily missed. Furthermore, as systems grow, no single reviewer understands the entire architecture well enough to detect every violation.</p>
<p>Static analysis tools provide some assistance, but they typically focus on style, complexity, or language-level issues. They do not understand business-level architectural rules such as module boundaries or dependency directions.</p>
<p>NetArchTest fills this gap by allowing you to define architecture rules directly in code.</p>
<h2>What NetArchTest Actually Does</h2>
<p>NetArchTest enables developers to inspect compiled assemblies and verify structural properties. It operates at the level of types, namespaces, dependencies, and inheritance relationships.</p>
<p>Instead of testing behaviour, it tests structure.</p>
<p>You can ask questions such as:</p>
<p>Does any class in the Domain layer reference Infrastructure?</p>
<p>Are all service classes named correctly?</p>
<p>Are certain types sealed or abstract?</p>
<p>Do controllers only depend on application layer interfaces?</p>
<p>Are modules independent from each other?</p>
<p>NetArchTest answers these questions by analysing metadata from compiled assemblies. Because it runs against compiled code, it is extremely fast and integrates seamlessly into standard unit test pipelines.</p>
<p>This approach transforms architecture from a set of conventions into a set of enforceable guarantees.</p>
<h2>Installing NetArchTest</h2>
<p>Adding NetArchTest to a solution is straightforward. It is typically installed in a dedicated test project responsible for architectural verification.</p>
<p>You add the package via NuGet:</p>
<pre><code class="language-plaintext">dotnet add package NetArchTest.Rules
</code></pre>
<p>Most Developers create a test project named something like:</p>
<p>ArchitectureTests</p>
<p>This project references the assemblies you want to analyse. It does not contain business logic tests. Its sole responsibility is enforcing architectural rules.</p>
<p>This separation keeps architecture validation distinct from functional testing.</p>
<h2>Understanding the Core Concepts</h2>
<p>NetArchTest revolves around a few core abstractions.</p>
<p>The first is Types. This represents a collection of types extracted from an assembly. You start every test by selecting which assembly or namespace you want to analyse.</p>
<p>The second is Rules. These define conditions that types must satisfy. Rules can check naming conventions, inheritance relationships, dependency restrictions, and more.</p>
<p>The third is Conditions. Conditions specify whether rules must pass or fail, such as ensuring a type should or should not have certain dependencies.</p>
<p>Finally, there are Results. After applying rules, NetArchTest returns a result indicating whether the architecture rule passed and which types violated it.</p>
<p>This model makes architectural testing expressive and readable.</p>
<h2>A Simple Example</h2>
<p>Take a standard layered architecture with separate projects for Domain, Application, and Infrastructure.</p>
<p>A fundamental rule is that the Domain layer must not depend on Infrastructure.</p>
<p>You can express this rule in NetArchTest as follows:</p>
<pre><code class="language-csharp">using NetArchTest.Rules;

public class DomainDependencyTests
{
    [Fact]
    public void Domain_Should_Not_Depend_On_Infrastructure()
    {
        var result = Types
            .InAssembly(typeof(DomainAssemblyMarker).Assembly)
            .ShouldNot()
            .HaveDependencyOn("MyApp.Infrastructure")
            .GetResult();

        Assert.True(result.IsSuccessful);
    }
}
</code></pre>
<p>This test inspects all types in the Domain assembly and ensures none reference the Infrastructure namespace.</p>
<p>If a developer accidentally introduces a dependency, this test fails immediately during CI.</p>
<p>Architecture is now enforced automatically.</p>
<h2>Enforcing Naming Conventions</h2>
<p>Naming conventions are often overlooked, but they play a crucial role in maintainability. NetArchTest can enforce these rules consistently.</p>
<p>Suppose all command handlers must end with the suffix "Handler".</p>
<p>You can enforce this with:</p>
<pre><code class="language-csharp">var result = Types
    .InNamespace("MyApp.Application.Commands")
    .Should()
    .HaveNameEndingWith("Handler")
    .GetResult();
</code></pre>
<p>This ensures consistent naming across the application layer.</p>
<p>Such rules prevent gradual erosion of structure as new developers join the project.</p>
<h2>Enforcing Layered Dependency Direction</h2>
<p>One of the most powerful uses of NetArchTest is enforcing dependency direction in layered architectures.</p>
<p>For example, in Clean Architecture, dependencies must flow inward toward the Domain layer. Infrastructure can depend on Application and Domain, but not the other way around.</p>
<p>You can enforce this rule like this:</p>
<pre><code class="language-csharp">var result = Types
    .InAssembly(typeof(ApplicationAssemblyMarker).Assembly)
    .ShouldNot()
    .HaveDependencyOn("MyApp.Infrastructure")
    .GetResult();
</code></pre>
<p>This prevents infrastructure concerns from leaking into application logic.</p>
<p>Over time, this rule preserves architectural purity.</p>
<h2>Enforcing Module Boundaries in Modular Monoliths</h2>
<p>For modular monoliths, architectural enforcement becomes even more critical.</p>
<p>Each module should be independent. Modules should communicate through well-defined interfaces rather than direct references.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/67c36038c69a4b7143c5fc49/6d4f8e5b-5465-421d-a7bb-3421e2c99f87.png" alt="" style="display:block;margin:0 auto" />

<p>NetArchTest allows you to enforce module isolation.</p>
<p>For example, suppose you have modules named Users and Billing.</p>
<p>You can ensure the Users module does not directly depend on Billing:</p>
<pre><code class="language-csharp">var result = Types
    .InNamespace("MyApp.Modules.Users")
    .ShouldNot()
    .HaveDependencyOn("MyApp.Modules.Billing")
    .GetResult();
</code></pre>
<p>This protects module boundaries and prevents tight coupling.</p>
<p>Without such enforcement, modular monoliths often degrade into distributed spaghetti.</p>
<h2>Testing for Layer Violations via Inheritance</h2>
<p>Another important use case is ensuring that certain classes inherit from specific base types.</p>
<p>For example, all domain entities might be required to inherit from a base Entity class.</p>
<p>You can enforce this rule:</p>
<pre><code class="language-csharp">var result = Types
    .InNamespace("MyApp.Domain.Entities")
    .Should()
    .Inherit(typeof(Entity))
    .GetResult();
</code></pre>
<p>This ensures consistency in domain modelling practices.</p>
<h2>Enforcing Interface Usage</h2>
<p>A common architectural rule is that higher layers should depend only on interfaces, not concrete implementations.</p>
<p>For example, controllers should depend only on application interfaces.</p>
<p>NetArchTest can verify this:</p>
<pre><code class="language-csharp">var result = Types
    .InNamespace("MyApp.Api.Controllers")
    .ShouldNot()
    .HaveDependencyOn("MyApp.Infrastructure")
    .GetResult();
</code></pre>
<p>This guarantees proper abstraction boundaries.</p>
<h2>Organising Architecture Tests</h2>
<p>As systems grow, architecture tests should be organised clearly.</p>
<p>Most teams structure them by architectural concern.</p>
<p>You might have test classes such as:</p>
<p>LayerDependencyTests</p>
<p>NamingConventionTests</p>
<p>ModuleIsolationTests</p>
<p>DomainIntegrityTests</p>
<p>This organisation makes the intent of each rule clear and ensures the test suite remains maintainable.</p>
<h2>Integrating NetArchTest into CI/CD</h2>
<p>Architecture tests should run automatically during CI builds. NetArchTest, being lightweight, adds minimal overhead to build times. When a rule is violated, the build fails immediately, providing developers with clear feedback on which types caused the failure. This creates a strong feedback loop that prevents architectural drift. Over time, this automated enforcement becomes one of the most valuable safeguards in a mature codebase.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/67c36038c69a4b7143c5fc49/e359f479-3dcc-4d7c-b896-1f03daf9ee15.png" alt="" style="display:block;margin:0 auto" />

<h2>Combining NetArchTest with Other Tools</h2>
<p>These tests work best when combined with other architectural enforcement strategies. You can pair them with Roslyn analysers for compile time checks, solution reference restrictions to prevent project dependencies, code review guidelines for contextual validation, and documentation to explain architectural intent. Together, these tools create a multi-layered defence against architectural decay.</p>
<h2>Practical Tips for Real World Use</h2>
<p>When introducing NetArchTest into an existing system, start small by enforcing only the most critical rules, such as preventing domain dependencies on infrastructure. Gradually expand coverage to include module boundaries, naming conventions, and inheritance requirements. Avoid creating overly rigid rules that hinder development flexibility, architecture enforcement should support productivity rather than obstruct it. Keep rules readable and well-documented so future developers understand their purpose.</p>
<h2>Why Architectural Testing Changes Team Behaviour</h2>
<p>Combining NetArchTest with other architectural enforcement strategies, such as Roslyn analysers for compile time checks, solution reference restrictions to prevent project dependencies, code review guidelines for contextual validation, and documentation to explain architectural intent, creates a multi-layered defence against architectural decay. When introducing NetArchTest into an existing system, start small by enforcing only the most critical rules, like preventing domain dependencies on infrastructure, and gradually expand to include module boundaries, naming conventions, and inheritance requirements. Avoid overly rigid rules that hinder development flexibility, architecture enforcement should support productivity rather than obstruct it. Keep rules readable and well-documented for future developers. When developers know architectural rules are enforced automatically, they design solutions more carefully and avoid shortcuts, as violations will be detected immediately. Architecture becomes a living, enforceable contract rather than an aspirational guideline, significantly improving long-term system maintainability.</p>
<h2>The Long Term Impact</h2>
<p>Over months and years, architectural testing prevents the slow decay that affects most large systems.</p>
<p>Boundaries remain intact. Dependencies remain predictable. Modules stay independent.</p>
<p>Teams can refactor with confidence because structural guarantees are always validated.</p>
<p>NetArchTest transforms architecture from a fragile concept into a durable, enforceable reality.</p>
<p>Architecture is easy to design but difficult to maintain. Without enforcement, even the best designs gradually erode under the pressure of deadlines and evolving requirements.</p>
<p>This provides a practical solution by allowing developers to encode architectural rules as executable tests. These tests run automatically, detect violations instantly, and prevent structural decay.</p>
<p>By integrating NetArchTest into your .NET solution, you transform architecture from documentation into enforcement. You create a system where boundaries remain intact, dependencies remain controlled, and long-term maintainability becomes achievable.</p>
<p>For teams building large enterprise systems, modular monoliths, or long lived platforms, this capability is essential.</p>
]]></content:encoded></item><item><title><![CDATA[Designing Local LLMs on Azure for Security, Reliability, and Control]]></title><description><![CDATA[In a previous post, I looked at what it really means to run LLMs locally from the perspective of a .NET developer. We explored why teams still care about local models despite the raw capability gap with GPT-5, how privacy, cost, latency, and complian...]]></description><link>https://fullstackcity.com/designing-local-llms-on-azure-for-security-reliability-and-control</link><guid isPermaLink="true">https://fullstackcity.com/designing-local-llms-on-azure-for-security-reliability-and-control</guid><category><![CDATA[llm]]></category><category><![CDATA[Azure]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Sat, 31 Jan 2026 16:10:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769875708580/7978a6f6-00ce-44aa-99ab-ab3309b18ac2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a <a target="_blank" href="https://fullstackcity.com/running-llms-locally-in-net-with-microsoftextensionsai">previous post</a>, I looked at what it really means to run LLMs locally from the perspective of a .NET developer. We explored why teams still care about local models despite the raw capability gap with GPT-5, how privacy, cost, latency, and compliance drive that decision, and how <a target="_blank" href="http://Microsoft.Extensions.AI"><strong>Microsoft.Extensions.AI</strong></a> makes it possible to build hybrid systems that switch cleanly between cloud models and a locally hosted runtime like Ollama. That discussion was deliberately developer-centric, focused on code, abstractions, and the trade-offs you experience when an application decides whether to call GPT-5 or keep data on the machine.</p>
<p>This article looks at a different problem entirely. Instead of asking how to run a model locally on a workstation, we are going to look at what it takes to design and operate a local-feeling LLM platform on Azure itself. Not a public API call wrapped in configuration, and not a developer experiment, but a properly isolated, private, identity-secured LLM that lives inside your virtual network and behaves like an internal service. The focus here is not provider abstraction, but architecture, private endpoints, managed identity, governance, and operational discipline. In other words, how you move from local models in a .NET app to LLMs as internal infrastructure</p>
<h2 id="heading-building-a-truly-local-llm-platform-on-azure">Building a Truly Local LLM Platform on Azure</h2>
<p>You might start experimenting with LLMs by calling a public API and wiring the response into an application. That phase is useful, but it ends quickly. As soon as the model touches real data, supports real users, or sits behind a regulated workflow, the questions change. Where does the data go? Who can access the model? How do we control behaviour over time? How do we prevent one bad prompt from blowing the budget?</p>
<p>This is where the idea of a local LLM becomes important.</p>
<p>Local does not mean on your laptop. It means the model is treated as infrastructure. It lives inside your security boundary, participates in your identity model, respects your network topology, and is observable and governable like any other internal service. Azure gives you the primitives to do this, but it does not assemble them for you. That is your job.</p>
<p>This article walks through that assembly in detail.</p>
<h2 id="heading-why-azure-is-a-good-fit-for-local-style-llms">Why Azure Is a Good Fit for Local Style LLMs</h2>
<p>Azure is unusually strong in this space because its AI offerings inherit the same enterprise primitives as storage, databases, and messaging. Azure OpenAI Service is not just a hosted model endpoint. It is a first-class Azure resource that supports private networking, Entra ID authentication, regional isolation, and resource level RBAC. That combination is rare. Many managed LLM platforms stop at API keys and IP allow lists. Azure lets you go further and treat inference as a zero trust internal dependency.</p>
<p>The consequence is architectural. You can design your system so that no developer, no CI pipeline, and no external system can talk to the model unless they are explicitly granted permission and are physically inside your network boundary.</p>
<p>That is what local really means in the cloud.</p>
<h2 id="heading-the-network-is-the-product-boundary">The Network Is the Product Boundary</h2>
<p><img src="https://learn.microsoft.com/en-us/azure/private-link/media/private-endpoint-dns/hub-and-spoke-azure-dns.png" alt="https://learn.microsoft.com/en-us/azure/private-link/media/private-endpoint-dns/hub-and-spoke-azure-dns.png" /></p>
<p><img src="https://learn.microsoft.com/en-us/azure/api-management/media/api-management-using-with-internal-vnet/api-management-vnet-internal.png" alt="https://learn.microsoft.com/en-us/azure/api-management/media/api-management-using-with-internal-vnet/api-management-vnet-internal.png" /></p>
<p>The most important decision you will make is to disable public access to the model endpoint and rely exclusively on a private endpoint. Everything else builds on this.</p>
<p>When you enable a private endpoint for Azure OpenAI, Azure assigns a private IP address inside your virtual network and publishes a private DNS record that overrides the public hostname. From that moment on, name resolution itself enforces isolation. Calls from outside your VNet do not resolve.</p>
<p>This is stronger than firewall rules. There is no public surface to attack.</p>
<p>In practice, this means your application workloads must also live inside the VNet, or be integrated into it. Azure Container Apps with VNet injection, AKS, and App Service with regional VNet integration all work. What matters is that the call path from application to model never leaves the private network.</p>
<p>If you get this wrong, everything else is compromised.</p>
<h2 id="heading-identity-is-not-optional">Identity Is Not Optional</h2>
<p>Using API keys for LLM access is the AI equivalent of storing database passwords in config files. It works, but its not safe.</p>
<p>Azure OpenAI supports Entra ID authentication, and you should consider this mandatory. With managed identity, your application authenticates using its own identity, not a shared secret. Access can be granted, audited, and revoked using the same mechanisms you already trust for storage accounts and message brokers. This has a subtle but powerful effect on system design. Instead of thinking in terms of “who has the key”, you start thinking in terms of “which workload is allowed to perform inference”. That aligns naturally with least privilege design. It also enables something important later. You can run multiple internal LLM facing services, each with different permissions, quotas, or even different model deployments, without duplicating secrets or configuration.</p>
<h2 id="heading-choosing-models">Choosing Models</h2>
<p>A common early mistake is to pick the largest, most capable model and build everything around it. This creates two problems. Cost becomes unpredictable, and latency becomes variable. In a local platform setup, you should think in terms of tiers. A fast, smaller model for classification, routing, extraction, and guardrail checks. A stronger model for reasoning heavy tasks. Possibly a specialised model for summarisation or transformation. Azure OpenAI allows you to deploy multiple models side by side. The key is to treat model selection as an architectural decision, not a prompt detail hidden in application code.</p>
<p>When you later introduce routing or fallback logic, this separation becomes invaluable.</p>
<h2 id="heading-the-llm-should-never-be-called-directly">The LLM Should Never Be Called Directly</h2>
<p>If every feature team writes their own prompt and calls the model directly, you lose control almost immediately. Prompts drift. Behaviour changes subtly. Costs explode in strange places. Nobody owns the system anymore.</p>
<p>Instead, you should introduce an internal LLM gateway service. This is not a proxy in the networking sense. It is a domain aware service that exposes operations like “summarise underwriting notes”, “extract risk factors”, or “draft internal correspondence”.</p>
<p>Internally, this service owns prompt templates, system instructions, token limits, retry policies, and output validation. Externally, it exposes stable, versioned contracts. This mirrors how Developers treat email, payments, or document rendering. The LLM is powerful, but it is not free form.</p>
<p>Once you centralise LLM access, prompt engineering stops being an ad hoc activity and becomes part of your delivery lifecycle. Prompts should be versioned. Changes should be reviewed. Behavioural differences should be tested against known inputs. In regulated environments, you may even need an approval process for prompt updates.</p>
<p>One effective pattern is to store prompts as structured templates with explicit inputs and outputs, rather than raw text blobs. This makes it easier to reason about what is allowed to change and what is not. Over time, this discipline is what keeps your system stable as models evolve.</p>
<h2 id="heading-retrieval-augmented-generation-done-properly">Retrieval Augmented Generation Done Properly</h2>
<p>Most discussions of RAG stop at embeddings and vector search. That is only half the story. In a production system, the retrieval pipeline is just as important as the model. You need to decide what content is eligible for retrieval, how it is chunked, how it is updated, and how relevance is enforced.</p>
<p>Azure gives you several options here, but Azure AI Search integrates particularly well with Azure OpenAI. It supports vector search, hybrid scoring, private endpoints, and managed identity. The critical point is that retrieval should happen before the model sees anything. The model should never be trusted to “remember” or “decide” what context it needs. You provide it with a constrained, curated slice of data and ask it to operate within that boundary. This is how you avoid hallucinations becoming system behaviour.</p>
<h2 id="heading-failure-modes-you-must-design-for">Failure Modes You Must Design For</h2>
<p>LLMs do not fail like databases, message queues or HTTP APIs. If you treat them as just another REST dependency, you will build something that looks stable in testing and becomes unpredictable in production.</p>
<p>The first class of failure is load-related degradation, not hard outages. LLM services tend to fail <em>slowly</em>. As concurrency increases, token queues grow, response times stretch, and eventually requests start timing out upstream. From the caller’s perspective this does not look like a clean failure. It looks like sporadic latency spikes, partial responses, and workflows that suddenly take seconds or minutes longer than expected. If you do not impose strict per-request timeouts and concurrency limits, a small surge in usage can cascade through your system and block unrelated work.</p>
<p>The second failure mode is cost-amplified failure. Traditional services usually fail cheap. An LLM can fail expensively. A subtle prompt change, a retrieval bug, or an unbounded user input can multiply token usage overnight. Nothing crashes, nothing throws an exception, but your bill explodes. This is one of the most dangerous characteristics of LLM systems because it bypasses most operational alarms. Architecturally, this means you must treat token usage as a first-class resource. Hard caps, per-operation budgets, and enforced truncation are not optimisations, they are safety rails.</p>
<p>The third class of failure is semantic drift. LLMs can be up, healthy, and responding successfully while still being wrong in a way that breaks your system. A model version update, a backend optimisation, or even a temperature tweak can subtly change behaviour. Summaries become less precise. Classifications start misfiring. Edge cases creep in. Unlike traditional regressions, these failures do not show up as errors. They show up as business logic slowly becoming unreliable. This is why prompt versioning, golden test cases, and behavioural monitoring matter. You are not just testing availability, you are testing meaning.</p>
<p>Another critical failure mode is partial unavailability. LLMs often fail unevenly. Streaming might work while full completions time out. Small prompts succeed while larger ones fail. One deployment remains responsive while another degrades. Your architecture needs to handle this without collapsing. That usually means isolating LLM calls behind a boundary that can apply routing, fallback models, or degraded modes without forcing every caller to understand those details.</p>
<p>Then there is dependency amplification. Many LLM calls depend on other systems before the model is even invoked. Retrieval pipelines, embedding generation, vector search, prompt construction, and policy checks all sit upstream. A failure in any one of these can make the LLM appear unreliable, even though inference itself is healthy. This is why treating the LLM as a single black box is a mistake. You need visibility and control over each stage of the pipeline, and the ability to short-circuit or degrade when part of that pipeline fails.</p>
<p>All of this leads to a hard architectural question, <em>what happens when the LLM is not available, not fast enough, or not trustworthy enough?</em> There is no universal answer. In some systems, you return cached or last-known-good output. In others, you fall back to a simpler rules based path. In some workflows, you block progress and surface a clear error because proceeding would be worse. The key point is that this decision must be made explicitly, per capability, not implicitly by letting timeouts bubble up. Circuit breakers are essential, but they are not enough on their own. A breaker that simply stops calls does not solve the user experience problem. You need to define degraded behaviour that still makes sense in your domain. That is why these decisions belong at the architectural level. They define system behaviour under stress, not just how code handles exceptions.</p>
<p>The final mistake is assuming that retries are always helpful. With LLMs, retries often amplify the problem. If the model is slow due to load, retrying increases load. If the failure is semantic, retrying produces the same wrong answer. Retries should be rare, bounded, and context-aware. Blind retries are a liability.</p>
<p>I guess the takeaway here is simple but uncomfortable. An LLM is not an optional enhancement once it sits on a critical path. It is a core dependency with unique failure characteristics. You must design for those characteristics deliberately, or the system will design itself under pressure.</p>
<p>Treat the LLM like any other critical dependency, because operationally and financially thats exactly what it is.</p>
<h2 id="heading-observability-beyond-token-counts">Observability Beyond Token Counts</h2>
<p>The first thing you need is latency distribution, not average latency. LLM performance degrades unevenly. Median latency may look fine while tail latency quietly doubles. This usually happens under load, when token queues back up or streaming responses stall. If you only watch averages, you will miss the early warning signs. P95 and P99 latency per operation tell you when the system is becoming unreliable long before it outright fails. This is especially important if LLM calls sit on synchronous user-facing paths.</p>
<p>Token usage must be tracked per capability, not per service. “Total tokens per day” tells you very little. What you actually need to know is which operation consumed them. Summarisation, classification, extraction, drafting, routing, each of these has a very different expected token profile. When one of them spikes, it is almost always a design regression, a prompt change, or a retrieval issue. Without per-operation attribution, cost overruns appear mysterious and uncontrollable.</p>
<p>Another critical dimension is prompt version awareness. LLM failures are often self-inflicted. A well-intentioned prompt tweak increases verbosity, weakens constraints, or causes the model to echo input. Nothing crashes. Token usage rises. Latency increases. Behaviour subtly shifts. If you cannot correlate requests and outputs to a specific prompt version, you cannot diagnose this class of failure. Prompt versions are effectively code. They must be observable as such.</p>
<p>Error rates also need reinterpretation. LLM systems do not fail with clean 500s. Many failures surface as timeouts, partial responses, truncated output, or invalid structured responses. You need to classify errors semantically. Was the response incomplete. Did it violate schema. Did it exceed budget. Did it arrive too late to be useful. These distinctions matter more than HTTP status codes.</p>
<p>One of the most overlooked signals is downstream impact. An LLM can appear healthy while degrading everything around it. Slower responses increase queue depth. Larger outputs stress storage or messaging systems. Poor classifications send workflows down the wrong path. Observability needs to extend beyond the model call itself and into what happens next. If LLM output feeds automated decisions, you should monitor reversal rates, human corrections, or escalation frequency. These are behavioural metrics, not technical ones, and they are often the first indication that something has gone wrong.</p>
<p>Logging deserves special care. Logging full prompts and responses is tempting, and in early experiments it can be useful. In production systems, it is often a liability. Prompts may contain sensitive data. Outputs may contain inferred or derived information that creates compliance risk. A safer pattern is to log metadata instead of content. Token counts, input and output sizes, hashes of prompts, prompt identifiers, model version, latency, and validation outcomes usually provide enough signal to diagnose issues without storing raw text. When deeper inspection is required, targeted sampling with strict access controls is safer than blanket logging.</p>
<p>All of this becomes dramatically easier when LLM access is centralised. If every team calls the model directly with their own prompts, observability fragments immediately. Metrics lose consistency. Prompt versions are unknown. Cost attribution becomes political. Centralisation does not mean slowing teams down. It means giving them a stable, observable platform instead of a shared bill. The uncomfortable truth is that LLM observability is closer to monitoring business logic than infrastructure. You are not just watching whether the system is up. You are watching whether it is behaving as intended. Token counts are a starting point. Serious systems go much further.</p>
<h2 id="heading-when-you-really-need-to-self-host-models">When You Really Need to Self Host Models</h2>
<p>There are legitimate cases where managed inference is not enough. Offline environments, extreme data sovereignty requirements, or deep fine-tuning at the weight level may push you toward self hosted models on AKS with GPU nodes. The important thing is that this is an intentional step, not an accident. The same principles still apply. Private networking. Managed identity where possible. Centralised access. Clear ownership. The mistake is thinking that self hosting is more local by default. Without discipline, it is often less secure and less predictable than a managed service.</p>
<h2 id="heading-the-end-state-you-should-aim-for">The End State You Should Aim For</h2>
<p>A successful local LLM on Azure does not feel like AI.</p>
<p>It has an endpoint name that looks like an internal service. It has dashboards. It has quotas. It has owners. Changes are deliberate. Behaviour is predictable.</p>
<p>When you reach that point, the model itself becomes interchangeable. You can swap versions, introduce new deployments, or even change inference backends without rewriting your application.</p>
<p>That is the real value of doing this properly.</p>
<p>If you build it this way from the start, you are not experimenting with LLMs. You are operating them.</p>
]]></content:encoded></item><item><title><![CDATA[Defending Against Confused Deputy Attacks in Azure]]></title><description><![CDATA[Most .NET developers working in Azure feel confident about identity. You use Entra ID, validate JWTs, and lean on Managed Identity instead of secrets. On paper, that is the right direction. Authentication is in place, access keys are gone, and the pl...]]></description><link>https://fullstackcity.com/defending-against-confused-deputy-attacks-in-azure</link><guid isPermaLink="true">https://fullstackcity.com/defending-against-confused-deputy-attacks-in-azure</guid><category><![CDATA[azure-security]]></category><category><![CDATA[cybersecurity]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[Application Security]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Sun, 25 Jan 2026 15:48:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769521814356/e67be078-ba2d-43e7-85cc-40505d9be01f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most .NET developers working in Azure feel confident about identity. You use Entra ID, validate JWTs, and lean on Managed Identity instead of secrets. On paper, that is the right direction. Authentication is in place, access keys are gone, and the platform is doing the heavy lifting. Yet there is a class of vulnerability that slips through systems that look correct. It does not rely on token theft or broken cryptography. It exploits something more subtle - implicit trust between services. This is the confused deputy problem.</p>
<p>If your system has multiple Azure workloads calling each other with Managed Identity, Event Grid, Service Bus, or Durable Functions, you can introduce confused deputy behaviour without realising it. The system stays secure by conventional checks, while still letting the wrong caller trigger the wrong privileged action.</p>
<p>A confused deputy attack happens when a highly privileged service performs an action on behalf of a less privileged caller without properly validating whether the request should be honoured. The key detail is that the deputy is not compromised. It is authenticated, authorised, and behaving as designed. The failure is that it is acting with its own authority on a request it should not have accepted. Azure makes this easy to miss because Managed Identity answers only one question: who am I. It does not answer who asked me to do this, why they asked, whether they are the right workload for this operation, or whether the operation is legitimate within your business rules. If you treat “valid token for my API” as sufficient proof of legitimacy, you have created a deputy that will do anything it is capable of doing for anyone who can obtain the right kind of token.</p>
<p>Take a concrete internal API example. You have an Invoices API that can write invoices to Azure SQL. A Billing worker calls it during a nightly workflow. Both use Managed Identity. Everything feels internal, so you add an endpoint like <code>POST /internal/invoices/generate</code> and protect it with normal JWT validation. You have now created a target where any Azure workload that can obtain a token for that API can trigger invoice generation. The call is authenticated, but the intent is unauthorised. That is a confused deputy.</p>
<p>Here is the failure mode in one sequence. The attacker does not steal a token. They simply obtain one legitimately from Entra ID for the right audience, then use your privileged service as the execution engine.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769354565857/4b2d63cd-fe39-4df9-a19a-25709b841065.png" alt class="image--center mx-auto" /></p>
<p>Managed Identity represents authority, not permission. It says this workload can do powerful things. It does not mean the caller is allowed to ask for those powerful things to be done. If you treat Managed Identity as permission, you are implicitly saying that any caller who can reach your API is allowed to ask it to do anything the API can do. That assumption collapses the moment you have multiple services, multiple teams, multiple workflows, or any pathway where untrusted input can influence a privileged operation. Event-driven systems amplify the problem. Event Grid, Service Bus, and Durable Functions often run without user context and execute with full Managed Identity authority. If your handler processes messages simply because they arrived, it becomes a deputy by default. The attacker no longer needs to reach your HTTP boundary. They only need to get the right event into the right pipeline.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769354776361/aa23d3f1-a8a0-4e43-ad33-e5db54a3ec88.png" alt class="image--center mx-auto" /></p>
<p>The core defence is intent binding. Every privileged operation needs to be bound to an explicit caller, an explicit target, and an explicit reason. In practical .NET terms, this means you cannot rely solely on ClaimsPrincipal and you cannot rely solely on Managed Identity. You must carry the context that explains why the operation is legitimate, and you must validate that context at the point where the privileged action occurs. One of the simplest and most effective hardening steps is to enforce strict audience validation and explicitly validate the calling workload identity. Many internal APIs accept any token with a valid issuer and broadly matching audience. That is not enough. You want exact audience matching and you want to reject calls from workloads that are not allowed to invoke this API, even if they can technically obtain a token.</p>
<p>Here is a practical example for <a target="_blank" href="http://ASP.NET">ASP.NET</a> using <code>AddJwtBearer</code>. This does not solve intent binding by itself, but it closes a major hole where any internal workload can call any internal API.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">using</span> Microsoft.AspNetCore.Authentication.JwtBearer;
<span class="hljs-keyword">using</span> Microsoft.IdentityModel.Tokens;

builder.Services
    .AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
    .AddJwtBearer(options =&gt;
    {
        options.Authority = <span class="hljs-string">$"https://login.microsoftonline.com/<span class="hljs-subst">{tenantId}</span>/v2.0"</span>;
        options.TokenValidationParameters = <span class="hljs-keyword">new</span> TokenValidationParameters
        {
            ValidAudience = <span class="hljs-string">"api://invoices"</span>,
            ValidateAudience = <span class="hljs-literal">true</span>,
            ValidateIssuer = <span class="hljs-literal">true</span>,
            ValidateLifetime = <span class="hljs-literal">true</span>,
            ValidateIssuerSigningKey = <span class="hljs-literal">true</span>,
            ClockSkew = TimeSpan.FromMinutes(<span class="hljs-number">1</span>)
        };

        options.Events = <span class="hljs-keyword">new</span> JwtBearerEvents
        {
            OnTokenValidated = ctx =&gt;
            {
                <span class="hljs-keyword">var</span> callerAppId = ctx.Principal?.FindFirst(<span class="hljs-string">"azp"</span>)?.Value
                    ?? ctx.Principal?.FindFirst(<span class="hljs-string">"appid"</span>)?.Value;

                <span class="hljs-keyword">if</span> (callerAppId <span class="hljs-keyword">is</span> <span class="hljs-literal">null</span> || !AllowedCallers.Contains(callerAppId))
                {
                    ctx.Fail(<span class="hljs-string">"Caller application is not allowed."</span>);
                }

                <span class="hljs-keyword">return</span> Task.CompletedTask;
            }
        };
    });

<span class="hljs-keyword">static</span> <span class="hljs-keyword">readonly</span> HashSet&lt;<span class="hljs-keyword">string</span>&gt; AllowedCallers = <span class="hljs-keyword">new</span>(StringComparer.OrdinalIgnoreCase)
{
    <span class="hljs-string">"00000000-0000-0000-0000-000000000001"</span>
};
</code></pre>
<p>Strict caller validation prevents random internal workloads from calling the API, but it still does not guarantee the call is legitimate in business terms. That is where intent binding comes in. The service must refuse to perform privileged work unless the request carries verifiable provenance and the operation is authorised against a rule you control.</p>
<p>A clean way to do this in <a target="_blank" href="http://ASP.NET">ASP.NET</a> is resource-based authorisation. Instead of authorising the principal in isolation, you authorise a principal acting on a specific resource context that includes caller intent.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">using</span> Microsoft.AspNetCore.Authorization;

app.MapPost(<span class="hljs-string">"/internal/invoices/generate"</span>,
    [<span class="hljs-meta">Authorize</span>] <span class="hljs-keyword">async</span> (
        <span class="hljs-keyword">string</span> callerService,
        <span class="hljs-keyword">string</span> correlationId,
        GenerateInvoiceRequest request,
        IAuthorizationService authorization,
        ClaimsPrincipal principal,
        CancellationToken stopToken) =&gt;
    {
        <span class="hljs-keyword">var</span> resource = <span class="hljs-keyword">new</span> InvoiceGenerationResource(
            callerService,
            correlationId,
            request.AccountId);

        <span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> authorization.AuthorizeAsync(principal, resource, <span class="hljs-string">"InvoiceGeneration"</span>);
        <span class="hljs-keyword">if</span> (!result.Succeeded)
            <span class="hljs-keyword">return</span> Results.Forbid();

        <span class="hljs-keyword">await</span> GenerateInvoicesAsync(request, stopToken);
        <span class="hljs-keyword">return</span> Results.Accepted();
    });
</code></pre>
<p>At this point the deputy is no longer “just checking a token”. It is checking whether this caller is allowed to request this operation, with this declared purpose, against this target. That logic lives in an authorisation handler, where you can combine workload identity from the token with declared intent from the request.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">using</span> Microsoft.AspNetCore.Authorization;

<span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">InvoiceGenerationRequirement</span> : <span class="hljs-title">IAuthorizationRequirement</span>;

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">InvoiceGenerationResource</span>(<span class="hljs-params">
    <span class="hljs-keyword">string</span> CallerService,
    <span class="hljs-keyword">string</span> CorrelationId,
    Guid AccountId</span>)</span>;

<span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">InvoiceGenerationHandler</span>
    : <span class="hljs-title">AuthorizationHandler</span>&lt;<span class="hljs-title">InvoiceGenerationRequirement</span>, <span class="hljs-title">InvoiceGenerationResource</span>&gt;
{
    <span class="hljs-function"><span class="hljs-keyword">protected</span> <span class="hljs-keyword">override</span> Task <span class="hljs-title">HandleRequirementAsync</span>(<span class="hljs-params">
        AuthorizationHandlerContext context,
        InvoiceGenerationRequirement requirement,
        InvoiceGenerationResource resource</span>)</span>
    {
        <span class="hljs-keyword">var</span> callerAppId = context.User.FindFirst(<span class="hljs-string">"azp"</span>)?.Value
            ?? context.User.FindFirst(<span class="hljs-string">"appid"</span>)?.Value;

        <span class="hljs-keyword">if</span> (callerAppId != <span class="hljs-string">"00000000-0000-0000-0000-000000000001"</span>)
            <span class="hljs-keyword">return</span> Task.CompletedTask;

        <span class="hljs-keyword">if</span> (!<span class="hljs-keyword">string</span>.Equals(resource.CallerService, <span class="hljs-string">"Billing.Worker"</span>, StringComparison.Ordinal))
            <span class="hljs-keyword">return</span> Task.CompletedTask;

        <span class="hljs-keyword">if</span> (resource.AccountId == Guid.Empty)
            <span class="hljs-keyword">return</span> Task.CompletedTask;

        context.Succeed(requirement);
        <span class="hljs-keyword">return</span> Task.CompletedTask;
    }
}
</code></pre>
<p>The same principle applies in event-driven paths, except the intent must be carried in the message itself. If you treat messages as implicitly trusted because they arrive on your subscription, you are delegating authority to any actor that can inject a message. A good handler validates provenance and schema before it performs privileged work, and it fails closed when the metadata is missing.</p>
<p>Here is an Azure Function example using Service Bus where intent is enforced before invoice generation is executed.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">using</span> Azure.Messaging.ServiceBus;
<span class="hljs-keyword">using</span> Microsoft.Azure.Functions.Worker;
<span class="hljs-keyword">using</span> System.Text.Json;

<span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">InvoiceMessages</span>
{
    [<span class="hljs-meta">Function(nameof(HandleInvoiceRequested))</span>]
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task <span class="hljs-title">HandleInvoiceRequested</span>(<span class="hljs-params">
        [ServiceBusTrigger(<span class="hljs-string">"invoice-requests"</span>, Connection = <span class="hljs-string">"ServiceBusConnection"</span></span>)]
        ServiceBusReceivedMessage msg,
        CancellationToken stopToken)</span>
    {
        <span class="hljs-keyword">if</span> (!msg.ApplicationProperties.TryGetValue(<span class="hljs-string">"producer"</span>, <span class="hljs-keyword">out</span> <span class="hljs-keyword">var</span> producer)
            || !Equals(producer, <span class="hljs-string">"Billing.Worker"</span>))
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> UnauthorizedAccessException(<span class="hljs-string">"Untrusted producer."</span>);

        <span class="hljs-keyword">if</span> (!msg.ApplicationProperties.TryGetValue(<span class="hljs-string">"schema"</span>, <span class="hljs-keyword">out</span> <span class="hljs-keyword">var</span> schema)
            || !Equals(schema, <span class="hljs-string">"invoice-requested:v1"</span>))
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> UnauthorizedAccessException(<span class="hljs-string">"Unexpected schema."</span>);

        <span class="hljs-keyword">var</span> payload = JsonSerializer.Deserialize&lt;InvoiceRequested&gt;(msg.Body);
        <span class="hljs-keyword">if</span> (payload <span class="hljs-keyword">is</span> <span class="hljs-literal">null</span> || payload.AccountId == Guid.Empty)
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> InvalidOperationException(<span class="hljs-string">"Invalid payload."</span>);

        <span class="hljs-keyword">await</span> GenerateInvoicesAsync(payload, stopToken);
    }
}

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">InvoiceRequested</span>(<span class="hljs-params">
    Guid AccountId,
    <span class="hljs-keyword">string</span> CorrelationId,
    DateOnly PeriodStart,
    DateOnly PeriodEnd</span>)</span>;
</code></pre>
<p>This is the heart of it. Confused deputy attacks do not happen because your tokens are invalid. They happen because your system accepts valid identity as sufficient proof of legitimate intent. The fix is not a new Azure feature. The fix is boundaries, explicit intent, and validation at the moment privileged work is executed.</p>
<p>Audit logs matter here as well, because they are how you catch deputies in the real world. When something goes wrong, you want logs that tell you who requested the action, which service executed it, and why it was allowed. If your logs only say “operation succeeded”, confused deputy behaviour disappears into the noise.</p>
<p>Before shipping a service, it is worth asking one uncomfortable question. If another internal workload can obtain a token for this API, can it trick this service into doing something dangerous. If the answer is not clearly “no” with a reason you can point at in code, you probably have a deputy waiting to be confused.</p>
<p>Confused deputy attacks already exist in many Azure systems, hidden behind valid tokens and successful authentication. The good news is that as a .NET developer you have the tools to shut them down. You control the authorisation pipeline, middleware, message handlers, and execution context. Use them deliberately, and your services stop being deputies that trust too much.</p>
]]></content:encoded></item><item><title><![CDATA[Part 10. Migrating a Legacy Layered .NET Application to a Modular Monolith]]></title><description><![CDATA[Most Development Teams do not wake up one morning with the freedom to design a modular monolith from scratch. They inherit something. A system that grew organically. A codebase that made sense at the time. Controllers, services, repositories, a share...]]></description><link>https://fullstackcity.com/part-10-migrating-a-legacy-layered-net-application-to-a-modular-monolith</link><guid isPermaLink="true">https://fullstackcity.com/part-10-migrating-a-legacy-layered-net-application-to-a-modular-monolith</guid><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[Modular Monolith]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 22:04:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118706150/cb58d416-dccb-440f-ac51-448afdc6ef02.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most Development Teams do not wake up one morning with the freedom to design a modular monolith from scratch. They inherit something. A system that grew organically. A codebase that made sense at the time. Controllers, services, repositories, a shared DbContext, and a hundred small decisions that all seemed reasonable in isolation.</p>
<p>And yet, every change feels heavier than it should.</p>
<p>The mistake many people make at this point is assuming that architecture has to be fixed in one decisive moment. A rewrite. A migration project. A “phase two”. That thinking kills momentum, because the system doesn’t stop needing features just because you want to improve it.</p>
<p>A successful migration does not start with structure. It starts with pressure.</p>
<p>The pressure usually shows up in one place first. A part of the system that changes often. A domain that attracts bugs. A feature area that nobody enjoys touching because it breaks unrelated things.</p>
<p>That area is not your problem child. It’s your opportunity.</p>
<p>You don’t modularise the whole system. You carve out one real boundary and make it honest.</p>
<p>At the start, a typical layered system looks something like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118769099/88efd149-828b-4e7d-9b98-8df54cc7cb23.png" alt class="image--center mx-auto" /></p>
<p>In a typical layered system, everything flows through the same pipes. Ownership is implied rather than enforced, and data is shared by default. The architecture relies on discipline and convention instead of hard boundaries.</p>
<p>Trying to “add modules” on top of that usually just adds folders. The underlying model does not change. The coupling is still there, just hidden a little better. That is not the move. The first real step is to identify a business capability that can stand on its own. Not technically, but conceptually. Something you can describe clearly without mentioning how the rest of the system works. Users. Billing. Reporting. Claims. Pick one.</p>
<p>Then you do something that feels small but is actually profound. You stop letting other parts of the system reach into its data. You do not refactor everything at once. You draw a line, and from that point on, the rules change.</p>
<p>That line usually starts as a namespace and ends as an assembly. You extract the code related to that capability into a new project. Not because projects are magical, but because they give you enforcement. Suddenly, references are explicit. Internals are important. The compiler starts helping you.</p>
<p>At first, it will feel awkward. The rest of the system still expects to call services and repositories directly. You don’t fight that immediately. You introduce a thin contract layer and adapt.</p>
<p>Conceptually, the system now looks like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118799164/678571c0-5d6e-4c35-9dae-51c379047b98.png" alt class="image--center mx-auto" /></p>
<p>The world hasn’t changed. But one boundary is now visible.</p>
<p>Inside that new module, you resist the urge to replicate the layered structure. This is where vertical slices matter most. You organise by behaviour, not by technical concern. CreateUser lives in one place. GetUser lives in one place. Each slice owns its own logic end to end.</p>
<p>This is the moment where migration and design intersect. You’re not just moving code. You’re changing how it’s shaped.</p>
<p>The shared DbContext is usually the next sticking point. This is where many migrations stall, because people try to solve everything at once. You don’t.</p>
<p>You let the module introduce its own DbContext while the rest of the system continues using the shared one. For a while, both exist.</p>
<p>That coexistence is not a failure. It’s a bridge.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118815677/3bc8c5ef-c644-4e99-ae59-62f6dd539c21.png" alt class="image--center mx-auto" /></p>
<p>During this phase, the goal is not purity. It’s containment. New behaviour goes through the module’s DbContext. Old behaviour stays where it is. Over time, the shared DbContext shrinks instead of growing.</p>
<p>That direction matters more than speed.</p>
<p>Inter-module communication follows the same pattern. At first, the legacy system may still call into the module synchronously. You accept that. But inside the module, you design as if those calls were remote. Contracts are explicit. Events represent facts. Failures are handled locally. You are laying tracks in the direction you want the system to move.</p>
<p>One of the hardest parts of this migration is psychological, not technical. You have to be comfortable with the system being temporarily uneven. Some parts clean. Some parts messy. Some parts modern. Some parts legacy. Trying to make everything consistent too early is how migrations die.</p>
<p>Old code can stay ugly for a while. That’s not debt. That’s triage.</p>
<p>Testing often improves naturally during this process. The first module you extract becomes the first place where slice tests, integration tests, and architecture tests feel obvious instead of forced. That contrast is useful. It gives the team a reference point. People don’t need to be convinced with diagrams. They feel the difference when changes stop breaking unrelated things.</p>
<p>Over time, something interesting happens. The module stops feeling like “the new thing” and starts feeling normal. Meanwhile, the legacy area starts to feel increasingly uncomfortable by comparison. That discomfort is not a problem. It’s information. It tells you where to go next.</p>
<p>Eventually, the system’s shape changes from this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118836991/ff6e03c7-563a-4aae-bfe9-f776d6e163a4.png" alt class="image--center mx-auto" /></p>
<p>To something more like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118858714/b89bc991-9927-481b-bcb1-819a5a2f6c9e.png" alt class="image--center mx-auto" /></p>
<p>And then, slowly, Legacy gets smaller.</p>
<p>Not because you planned a rewrite, but because you stopped feeding it.</p>
<p>The most important thing to understand about this kind of migration is that it is not linear. You will pause. You will adapt. You will make compromises. That’s not failure. That’s working in a live system. The success criteria is simple. Each month, is the system easier to change than it was the month before?</p>
<p>If the answer is yes, you’re doing it right.</p>
<p>This is where the series comes full circle. A modular monolith is not a destination you reach. It’s a direction you commit to. A way of making trade-offs explicit instead of accidental.</p>
<p>You don’t need permission to start. You need one boundary, one module, and the discipline to protect it.</p>
<p>Everything else follows.</p>
<hr />
<p>If someone reads this series end to end and only takes one thing away, I hope it’s this, architecture is not about being clever early. It’s about staying honest over time.</p>
<p>That’s what actually scales.</p>
]]></content:encoded></item><item><title><![CDATA[Part 9. Operational Concerns in Modular Monoliths]]></title><description><![CDATA[There’s a point in every system’s life where architecture stops being something you reason about and starts being something you experience. It usually arrives during an incident, when logs are noisy, alerts are firing, and someone asks a deceptively ...]]></description><link>https://fullstackcity.com/part-9-operational-concerns-in-modular-monoliths</link><guid isPermaLink="true">https://fullstackcity.com/part-9-operational-concerns-in-modular-monoliths</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[.NET]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Modular Monolith]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 21:44:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769118213861/9d725db9-0902-446b-8bdd-d2599f9384b4.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There’s a point in every system’s life where architecture stops being something you reason about and starts being something you <em>experience</em>. It usually arrives during an incident, when logs are noisy, alerts are firing, and someone asks a deceptively simple question.</p>
<p>“Which part of the system is actually broken?”</p>
<p>If your modular monolith can’t answer that clearly, then at runtime it isn’t really modular at all.</p>
<p>In development, modules are boundaries in code. In production, modules must be boundaries in signal. Logs, metrics, health, failures, and alerts must respect the same lines your architecture does. If they don’t, all the care you took earlier collapses into a single operational blob.</p>
<p>The runtime shape of a healthy modular monolith looks something like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769117587290/71e94883-a632-459d-9dbc-9992b506728f.png" alt class="image--center mx-auto" /></p>
<p>One process. One deployment. Multiple, clearly attributable sources of behaviour.</p>
<p>If your logs don’t already look like this conceptually, diagnosing problems will always be slower than it needs to be.</p>
<p>The same idea applies to metrics. A single latency number for the entire application tells you very little once the system grows beyond trivial size. What you actually need to know is whether a slowdown is systemic or local.</p>
<p>At runtime, the system should feel more like this than a flat line.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769117658379/f8d21575-cf10-471b-80f0-39b61ee7f65d.png" alt class="image--center mx-auto" /></p>
<p>This doesn’t mean every module needs bespoke dashboards for everything. It means the <em>possibility</em> exists. When an alert fires, you can immediately tell whether one module is misbehaving or whether the entire application is under stress.</p>
<p>That distinction is the difference between a calm response and a panicked one.</p>
<p>Health checks are where many modular monoliths accidentally lie to their operators. A single “healthy or unhealthy” signal flattens the system into something it no longer is. It forces binary decisions in a world that is not binary.</p>
<p>A more honest mental model looks like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769117683447/42875e34-c0bc-474b-8008-855c1d86bb0b.png" alt class="image--center mx-auto" /></p>
<p>In this model, the system can express partial degradation. Users might be healthy. Reporting might be lagging. Billing might be failing to reach an external dependency. The application is still running, but not everything is equal. Failure handling is where the difference between architectural intent and operational reality becomes obvious. If an exception in one module can crash unrelated behaviour, then isolation exists only in your head.</p>
<p>At runtime, you want failure to look like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769117706736/1cfce7b5-bd87-4a4a-98a7-bc32f7df1680.png" alt class="image--center mx-auto" /></p>
<p>Billing failing to react to an event should not destabilise Users. If it does, you’ve accidentally recreated a distributed transaction, just without the tooling to see it.</p>
<p>This is one of the reasons earlier parts of the series pushed so hard on honest communication and local failure. Operations is where dishonesty is punished.</p>
<p>Database operations are another place where modularity often collapses under pressure. When all schema changes are treated as one global concern, deployments become coupled even if the code is not.</p>
<p>The operationally healthy shape looks like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769117730252/fe3f2e18-c6e5-4735-bb8b-6f3d9332ccb7.png" alt class="image--center mx-auto" /></p>
<p>Even if all schemas live in the same physical database, ownership is clear. Migration failures are attributable. Rollbacks are targeted. People stop fearing deployments because not every change threatens everything else.</p>
<p>One of the most telling operational signals is how incidents are described. In unhealthy systems, everything is “the system”. In healthy modular monoliths, incidents are scoped naturally.</p>
<p>“We’re seeing elevated latency in Billing.”<br />“Users is healthy, but Reporting is behind.”</p>
<p>Those sentences only make sense if the runtime architecture supports them.</p>
<p>There’s also a human dimension here that’s easy to underestimate. Systems that are hard to observe become systems people are afraid to touch. Over time, that fear turns into conservatism, then stagnation. Not because change is dangerous, but because the feedback loop is too vague to trust.</p>
<p>Good operational boundaries don’t just help machines. They help people stay confident.</p>
<p>One of the quiet benefits of doing this work inside a modular monolith is that it prepares you for the future without forcing it. If one module eventually needs to be extracted, the operational model already exists.</p>
<p>The diagrams don’t change much. The lines just move.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769117755515/15b879cd-00dd-4a7e-80ec-2d1bc970b993.png" alt class="image--center mx-auto" /></p>
<p>If this looks familiar, that’s intentional. Extraction should feel like relocation, not reinvention.</p>
<p>Operational clarity is not a nice-to-have. It’s the difference between a system that survives contact with reality and one that slowly erodes trust.</p>
<p>If Parts 1 through 8 were about making the code honest, Part 9 is about making the system honest while it’s running.</p>
<hr />
<p>That leaves one final problem to address. Not how to design from scratch, and not how to operate once things are clean, but how to get from where most teams actually are to where this series has been pointing all along.</p>
<p><strong>Part 10</strong> is about migrating a legacy layered .NET application to a modular monolith, without stopping delivery or pretending you have infinite time.</p>
]]></content:encoded></item><item><title><![CDATA[Part 8. Versioning Modules Independently Inside a Single Deployment]]></title><description><![CDATA[One of the quiet assumptions people carry into modular monoliths is that versioning only becomes a problem once you go distributed. While everything lives in one deployment, the thinking goes, you can just change things together and move on.
That ass...]]></description><link>https://fullstackcity.com/part-8-versioning-modules-independently-inside-a-single-deployment</link><guid isPermaLink="true">https://fullstackcity.com/part-8-versioning-modules-independently-inside-a-single-deployment</guid><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 20:12:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769112648057/32079042-6df9-46b7-8707-a0a8652735ce.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of the quiet assumptions people carry into modular monoliths is that versioning only becomes a problem once you go distributed. While everything lives in one deployment, the thinking goes, you can just change things together and move on.</p>
<p>That assumption holds only while change is cheap and coordinated. The moment different parts of the system start evolving at different speeds, the absence of a versioning strategy turns every change into a negotiation.</p>
<p>This is not a microservices problem. It’s a time problem.</p>
<p>In a real modular monolith, modules do not mature evenly. Some stabilise quickly and barely change. Others sit close to the business edge and churn constantly. Treating them as if they all move together is how you end up with artificial constraints and unnecessary coupling.</p>
<p>The first thing to be clear about is what we mean by “versioning” in this context. We are not talking about NuGet packages or semantic version numbers published to the world. We are talking about the ability for one module to change its behaviour or contracts without forcing immediate changes in every consumer.</p>
<p>That sounds abstract until you see the alternative.</p>
<p>Imagine a Users module that exposes a query returning user details. At first, it looks like this.</p>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">UserSummary</span>(<span class="hljs-params">Guid Id, <span class="hljs-keyword">string</span> Email</span>)</span>;
</code></pre>
<p>Billing consumes it. Reporting consumes it. Everything is fine.</p>
<p>Then the Users module evolves. The business now distinguishes between primary and secondary email addresses. The model changes.</p>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">UserSummary</span>(<span class="hljs-params">
    Guid Id,
    <span class="hljs-keyword">string</span> PrimaryEmail,
    IReadOnlyList&lt;<span class="hljs-keyword">string</span>&gt; SecondaryEmails
</span>)</span>;
</code></pre>
<p>If this change ripples through the entire system immediately, you don’t have independent evolution. You have coordinated refactoring. In a small codebase that might be acceptable. In a system that’s growing, it becomes a tax you pay over and over again.</p>
<p>The simplest and most reliable way to version inside a monolith is not through numbers, but through coexistence. Old behaviour stays available while new behaviour is introduced alongside it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769112016285/975619d0-7732-4db8-a3bd-d87c31af05d0.png" alt class="image--center mx-auto" /></p>
<p>Nothing forces every consumer to move at once. Migration happens incrementally, with intent.</p>
<p>In practice, this usually means versioning contracts, not modules.</p>
<p>The Users module still deploys as one unit. It still owns its data. But it exposes more than one contract shape for a period of time.</p>
<p>That might look like this in code.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title">IUserQueriesV1</span>
{
    <span class="hljs-function">Task&lt;UserSummaryV1&gt; <span class="hljs-title">GetAsync</span>(<span class="hljs-params">Guid userId</span>)</span>;
}

<span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title">IUserQueriesV2</span>
{
    <span class="hljs-function">Task&lt;UserSummaryV2&gt; <span class="hljs-title">GetAsync</span>(<span class="hljs-params">Guid userId</span>)</span>;
}
</code></pre>
<p>Both are implemented internally by the Users module. Both are valid. One is newer. This feels slightly uncomfortable at first, because it introduces duplication. That discomfort is the cost of time. You’re paying it explicitly instead of smearing it across the system.</p>
<p>What matters is that versioning lives at the boundary, not in the core. Internally, the module should have one clear model and one clear understanding of the business. The translation to older or newer shapes happens at the edge. That keeps the complexity contained.</p>
<p>Feature flags are often suggested as an alternative to versioning, and they do have a place here, but only if you’re honest about what they are doing. A feature flag that permanently changes behaviour for different consumers is not a flag. It’s a version switch disguised as configuration.</p>
<p>Used carefully, flags help you stage a change. Used carelessly, they become a second, hidden versioning system that nobody fully understands.</p>
<p>The rule I follow is simple. Flags are temporary. Versions are explicit. If a conditional is going to live longer than a release cycle or two, it should probably be a versioned contract instead.</p>
<hr />
<p>While we’re talking about feature flags, it’s probably worth being transparent about something. A lot of the opinions in this section come from having been burned by flag sprawl more than once, which is why I’ve been building a small feature flag platform called <a target="_blank" href="http://www.flagmesh.com"><strong>Flagmesh</strong></a> on the side.</p>
<p>The goal isn’t to turn flags into a second versioning system or bury business logic behind configuration, but to make it obvious what flags exist, why they exist, and when they should be removed. If you ever find yourself needing feature flags and want something deliberately opinionated about scope and cleanup, it’s there. If not, the principles still stand regardless of tooling.</p>
<hr />
<p>Another place versioning shows up is in events. This is where people often get caught out, because events feel immutable until they aren’t. An event is a promise. Once it’s published, consumers will build assumptions around it. Changing it silently is one of the fastest ways to break trust between modules.</p>
<p>When an event needs to evolve, the safest approach is the most boring one, publish a new event type.</p>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">UserCreatedV1</span>(<span class="hljs-params">Guid UserId, <span class="hljs-keyword">string</span> Email</span>)</span>;
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">UserCreatedV2</span>(<span class="hljs-params">
    Guid UserId,
    <span class="hljs-keyword">string</span> PrimaryEmail,
    IReadOnlyList&lt;<span class="hljs-keyword">string</span>&gt; SecondaryEmails
</span>)</span>;
</code></pre>
<p>Both can coexist. Producers can emit both for a while if needed. Consumers can migrate on their own schedule. This is not wasteful. It is respectful of time and autonomy. Inside a single deployment, this approach has an important benefit. You can see who is still using the old contract. You can log it. You can measure it. You can remove it deliberately when the time is right. Nothing is implicit. Nothing breaks “by accident”.</p>
<p>One subtle but important point is that independent versioning only works if your modules are already decoupled in the ways described earlier in the series. If consumers are reaching into internals, or sharing DbContexts, or coordinating synchronously, versioning becomes performative. You can put a V2 suffix on things, but the coupling is still there. This is why versioning belongs after boundaries, slices, data ownership, and communication patterns. It’s not a starting point. It’s a capability you earn.</p>
<p>A modular monolith that supports independent versioning is a system that understands it will live longer than its first set of assumptions. It doesn’t cling to the idea that everything must always move together. It accepts that change is uneven and builds around that reality.</p>
<p>That acceptance is what keeps the system supple instead of brittle.</p>
<hr />
<p>In the next part of the series, the focus shifts from evolution to operation. Once modules evolve independently, you need to be able to see, diagnose, and manage them independently as well. Logging, metrics, health, and failure handling become the difference between confidence and guesswork.</p>
]]></content:encoded></item><item><title><![CDATA[Part 7. Knowing When Your Modular Monolith Is Ready to Split]]></title><description><![CDATA[By the time you ask whether a modular monolith should be split, you already know the system well enough that the question feels uncomfortable. If it feels academic, you’re not ready. The moment it becomes emotionally charged, when people disagree str...]]></description><link>https://fullstackcity.com/part-7-knowing-when-your-modular-monolith-is-ready-to-split</link><guid isPermaLink="true">https://fullstackcity.com/part-7-knowing-when-your-modular-monolith-is-ready-to-split</guid><category><![CDATA[Modular Monolith]]></category><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 19:54:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769111558116/fab612f3-a691-430b-9348-930dfcece269.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>By the time you ask whether a modular monolith should be split, you already know the system well enough that the question feels uncomfortable. If it feels academic, you’re not ready. The moment it becomes emotionally charged, when people disagree strongly and for different reasons, that’s usually when the system is starting to tell you something.</p>
<p>What matters is learning to distinguish between structural readiness and organisational impatience.</p>
<p>A modular monolith that is genuinely ready to split already behaves like a distributed system in all the places that matter. The boundary you are considering extracting is already autonomous in practice. It owns its data. It commits independently. It communicates through contracts rather than shared state. Failures inside it are visible and tolerated rather than catastrophic.</p>
<p>When that is true, extraction does not introduce new concepts. It changes where things live.</p>
<p>You can often see this clearly by looking at the execution flow.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769111258698/cdc6d054-2de6-4a3e-8aff-bd7ce729066e.png" alt class="image--center mx-auto" /></p>
<p>If this diagram already represents reality inside your monolith, then moving Billing out of process does not change the mental model. It changes transport, deployment, and observability, but not behaviour.</p>
<p>That’s the first and most important signal.</p>
<p>Contrast that with a system where the flow actually looks like this, even if nobody admits it out loud.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769111290389/5f81facc-3766-4c8c-ab87-3b95b39618c3.png" alt class="image--center mx-auto" /></p>
<p>This is not a candidate for extraction. This is a warning. What you really have here is a single consistency boundary pretending to be modular. Pulling it apart will not create independence, it will just expose coupling that was previously hidden by process boundaries.</p>
<p>Another strong signal lives in your codebase, not your diagrams. If you look at a module and its public surface area is small, stable, and boring, that’s a good sign. A module that is ready to be split does not need a rich internal API exposed to the rest of the system. It needs a narrow set of contracts that have already proven themselves under change.</p>
<p>That usually looks something like this.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title">IBillingEvents</span>
{
    <span class="hljs-function">Task <span class="hljs-title">Handle</span>(<span class="hljs-params">UserCreated @<span class="hljs-keyword">event</span>, CancellationToken stopToken</span>)</span>;
}
</code></pre>
<p>Notice what isn’t there. There is no shared DbContext. There is no orchestration logic. There is no dependency on Users’ internals. Billing reacts to a fact, not a request to coordinate behaviour.</p>
<p>When a module’s public API starts looking like this naturally, without effort or policing, it’s a sign that the boundary is real.</p>
<p>Operational behaviour is another place where readiness becomes obvious. In a healthy modular monolith, a failure inside one module is already treated as local. Logs are scoped. Metrics are scoped. Alerts are scoped. People don’t panic when Billing misbehaves, because Users continues to function.</p>
<p>If your on-call response already distinguishes between “the system is down” and “that module is down”, then extraction won’t change how incidents are reasoned about. It will only change how they are mitigated.</p>
<p>If, on the other hand, every failure is treated as a system-wide emergency because everything is still tightly coupled, splitting will multiply that pain, not reduce it.</p>
<p>There is also a very practical, almost boring test that I’ve learned to trust. Ask yourself how hard it would be to introduce a network boundary tomorrow. Not actually do it, just introduce it. If replacing an in-process event bus with a real message broker feels like a mostly mechanical change, you are close. If it feels like a redesign that touches business logic, persistence, and error handling all at once, you are not.</p>
<p>That distinction matters more than any architectural diagram.</p>
<p>It’s also worth being explicit about what <em>doesn’t</em> justify splitting. Performance alone rarely does. Scale alone rarely does. The phrase “we’ll need microservices eventually” almost never does. Those are projections, not pressures. The pressures that matter are present tense. Teams blocking each other. Deployment risk that forces coordination meetings. Modules with incompatible uptime or scaling needs that are being artificially flattened by a shared runtime. When those pressures exist and the boundaries are already clean, extraction becomes an act of alignment, not rescue.</p>
<p>One of the most overlooked aspects of this decision is cognitive load. A modular monolith done well reduces it. A distributed system increases it. If your current system already feels heavy to reason about, adding network boundaries will not lighten that load. It will distribute it across logs, dashboards, retries, and failure modes.</p>
<p>If, however, your modular monolith feels calm, predictable, and honest, then splitting a module can actually preserve that calm by preventing future coupling from creeping back in.</p>
<p>From experience, the best extractions I’ve seen were almost boring from a code perspective. The module was already designed as if it were remote. The code barely changed. Most of the work happened in CI pipelines, infrastructure, and observability. That’s exactly how it should be. The worst extractions were dramatic. They required heroics. They broke things in surprising ways. In hindsight, they were not premature because microservices are bad. They were premature because modularity was incomplete.</p>
<p>The point of a modular monolith is not to delay microservices. It’s to make them optional. To give you the ability to say yes or no based on reality, not fashion or fear. If you reach the point where a module can leave without the rest of the system noticing much beyond a configuration change, then you’ve succeeded, regardless of whether you actually do it.</p>
]]></content:encoded></item><item><title><![CDATA[Part 6. Testing Strategy for Modular Monoliths Beyond Unit Tests]]></title><description><![CDATA[By the time you reach this point in the series, you have made a lot of architectural promises. You have said that modules are isolated, that behaviour is local, that data is owned, that communication is explicit, and that authorisation does not leak ...]]></description><link>https://fullstackcity.com/part-6-testing-strategy-for-modular-monoliths-beyond-unit-tests</link><guid isPermaLink="true">https://fullstackcity.com/part-6-testing-strategy-for-modular-monoliths-beyond-unit-tests</guid><category><![CDATA[monolithic architecture]]></category><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 19:44:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769110941537/e985efc1-f038-493a-8b3f-d13a8a908210.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>By the time you reach this point in the series, you have made a lot of architectural promises. You have said that modules are isolated, that behaviour is local, that data is owned, that communication is explicit, and that authorisation does not leak across boundaries.</p>
<p>Testing is where those claims get audited. Not by diagrams, and not by good intentions, but by code that either passes cleanly or fails in ways you did not expect. Tests have a way of surfacing the truth about a design, because they force the architecture to be exercised rather than explained. This is the part many Developers quietly dread. It exposes the gap between architecture that sounds good in conversation and architecture that actually holds up under pressure. When tests are aligned with real boundaries, they reinforce the design. When they are not, they reveal exactly where those promises start to break down.</p>
<h2 id="heading-why-traditional-testing-advice-breaks-down-here">Why Traditional Testing Advice Breaks Down Here</h2>
<p>Most traditional testing advice assumes one of two worlds. Either you are building a small application where unit tests are sufficient, or you are working in a distributed system where end-to-end tests are unavoidable. A modular monolith sits in an awkward middle ground that neither model fits particularly well.</p>
<p>If you only write unit tests, you miss important classes of failure. Broken wiring goes unnoticed. Boundary violations slip through. Configuration errors hide until runtime. The system looks healthy in isolation, but the pieces do not actually fit together the way you think they do.</p>
<p>At the other extreme, relying heavily on end-to-end tests creates a different set of problems. They are slow to run, brittle to maintain, and hard to diagnose when something fails. Over time, people stop trusting them, and once trust is gone, the tests stop doing their job.The mistake is treating testing as a ladder, moving neatly from unit tests to integration tests to end-to-end tests. In a modular monolith, testing works better as layers of confidence, with each layer answering a different question about the system.</p>
<h2 id="heading-the-question-your-tests-should-answer">The Question Your Tests Should Answer</h2>
<p>Before writing any test, I ask one thing:</p>
<blockquote>
<p>What architectural promise am I trying to protect?</p>
</blockquote>
<p>If you can’t answer that, the test probably doesn’t matter.</p>
<h2 id="heading-level-1-slice-tests-your-primary-workhorse">Level 1: Slice Tests (Your Primary Workhorse)</h2>
<p>Vertical slices change what “unit testing” really means. The unit is no longer a method, a class, or a service. The unit is the use case itself, the thing the system actually does.</p>
<p>A slice test exercises the full behaviour of that use case. It runs the handler, applies validation, touches persistence, and enforces business rules. What it deliberately avoids is anything outside the slice’s responsibility. There is no HTTP layer involved. There is no serialisation. There is no real infrastructure beyond what the module owns. The key shift is where mocking happens. Mocks belong at the module boundary, not inside the slice. Inside a module, you want reality. You want real code paths and real interactions, because that is how you gain confidence that the module actually works.</p>
<p>A CreateUser slice test, for example, should use a real UsersDbContext backed by an in-memory or test database. It should not mock repositories, and it should not mock EF. If the slice cannot work with its own real persistence model in a test, that is a signal worth paying attention to.</p>
<p>You want to know:</p>
<blockquote>
<p>Does this behaviour actually work?</p>
</blockquote>
<p>Not:</p>
<blockquote>
<p>Can I make this method return the value I expect?</p>
</blockquote>
<h2 id="heading-why-mocking-internals-destroys-confidence">Why Mocking Internals Destroys Confidence</h2>
<p>Mocking internals buys you speed, but it does so at the cost of truth. A test that mocks repositories, DbContexts, or domain behaviour is not really testing the system. It is testing your expectations about how the system should behave.</p>
<p>In a modular monolith, most bugs do not live in pure logic. They live in mapping, configuration, wiring, and persistence assumptions. Those are exactly the areas that heavy mocking conveniently hides, which is why everything looks fine in tests. When a slice test fails, you want it to fail because the system is wrong. You do not want it to fail because the test was overly clever or made assumptions that no longer hold. Tests that lean toward reality may be a little slower, but they give you something far more valuable: confidence that the behaviour you see in production is the behaviour you exercised in tests.</p>
<h2 id="heading-level-2-module-integration-tests-boundaries-under-load">Level 2: Module Integration Tests (Boundaries Under Load)</h2>
<p>Slice tests tell you that a feature works in isolation. They do not tell you whether modules cooperate correctly once they start talking to each other. That gap is where module integration tests earn their keep.</p>
<p>A module integration test boots a module in a way that is close to reality. It uses real persistence. It uses real in-process messaging. And it exercises multiple slices together, allowing behaviour to flow across a boundary rather than stopping at it.</p>
<p>For example, creating a user publishes an event. The Billing module reacts to that event. A billing profile is created. There is no HTTP involved and no UI in the way, just behaviour moving from one module to another exactly as it would at runtime.</p>
<p>This level of testing answers a different question: do my modules interact the way I believe they do? When these tests fail, the cause is usually not a bug in business logic. It is an architectural failure. An event contract changed. A handler was not registered. A transaction boundary moved. A dependency leaked across a boundary.</p>
<p>That distinction matters, because architectural failures require architectural fixes. Treating them like logic bugs only papers over the real problem.</p>
<h2 id="heading-level-3-contract-tests-protecting-module-agreements">Level 3: Contract Tests (Protecting Module Agreements)</h2>
<p>If modules communicate through contracts, those contracts deserve tests of their own. A contract test does not care how something is implemented. It cares about shape and meaning, and whether both sides still agree on what is being exchanged.</p>
<p>For example, an event is expected to contain specific fields. A query should return data in a particular form. A command should accept a defined set of inputs. None of this is about behaviour inside a module. It is about the agreement between modules.</p>
<p>These tests protect you from one of the most dangerous changes in a modular system: “I just renamed this property, nothing else uses it.” Something always uses it. The damage just happens quietly if nothing is watching. Contract tests live close to the contract itself. They run fast. They fail loudly. And when they fail, they fail for a good reason. They are cheap insurance against silent breakage in places where the system relies on trust between modules. If modules communicate through contracts, those contracts need to be treated as first-class citizens. A contract test exists for one purpose only, to protect the agreement between modules. It does not care how something is implemented internally. It cares about shape, semantics, and intent.</p>
<p>An event must contain specific fields with specific meaning. A query must return data in an expected form. A command must accept a defined set of inputs and reject anything else. These tests are not about business behaviour, they are about ensuring that two independently evolving pieces of the system still understand each other. They protect you from one of the most dangerous changes in modular systems, “I just renamed this property, nothing else uses it”. Something always uses it. The problem is that without contract tests, you do not find out until much later, and usually in a place far removed from the change.</p>
<p>Good contract tests live close to the contract. They run fast. They fail loudly. And when they fail, they tell you exactly what agreement was broken. They are cheap insurance against silent, slow-burn failures that otherwise erode confidence in the architecture over time.</p>
<h2 id="heading-level-4-architecture-tests-your-early-warning-system">Level 4: Architecture Tests (Your Early Warning System)</h2>
<p>Architecture tests sit at the top of the stack, and they do not test behaviour at all. They test structure. That is precisely why they are so effective, and often so uncomfortable. These tests ask blunt questions. Does Billing depend on Users.Infrastructure? Does any module reference another module’s DbContext? Are internals leaking across assembly boundaries? None of these questions are about whether the application appears to work today.</p>
<p>They do not care if the app runs. They care if the architecture is honest. When an architecture test fails, it is usually because someone made a small, reasonable-looking change that would have turned into a serious problem six months later.</p>
<h3 id="heading-architecture-tests-need-tooling-support">Architecture Tests Need Tooling Support</h3>
<p>Architecture tests don’t work on intent alone. They need enforcement at build time.</p>
<p>In .NET, that usually means using a structural testing framework that can inspect assemblies and dependencies directly. The two I see used most often in real systems are:</p>
<ul>
<li><p><strong>NetArchTest</strong> – simple, focused, and very effective for dependency rules</p>
</li>
<li><p><strong>ArchUnitNET</strong> – more expressive, closer to formal architectural modelling</p>
</li>
</ul>
<p>The specific tool matters less than what you do with it.</p>
<p>A good architecture test doesn’t try to prove the system works. It tries to prove the system <em>can’t cheat</em>.</p>
<p>For example, this single test prevents one of the most damaging boundary violations you can make:</p>
<pre><code class="lang-csharp">Types.InAssembly(<span class="hljs-keyword">typeof</span>(BillingRoot).Assembly)
    .ShouldNot()
    .HaveDependencyOn(<span class="hljs-string">"Users.Infrastructure"</span>)
    .GetResult()
    .IsSuccessful
    .Should()
    .BeTrue();
</code></pre>
<p>This test doesn’t care about behaviour. It doesn’t care about data. It cares about honesty.</p>
<p>When it fails, it fails because someone crossed a boundary they weren’t meant to cross. That’s exactly when you want the build to break.</p>
<p>If you only ever write one kind of architecture test, write this kind.</p>
<h2 id="heading-where-end-to-end-tests-actually-fit">Where End-to-End Tests Actually Fit</h2>
<p>End-to-end tests are not useless. They are just overused. In a modular monolith, their role is much narrower than many people expect.</p>
<p>They should be few in number and focused only on critical paths. Their job is not to validate business rules or edge cases. It is to confirm that the system is wired together correctly at the highest level. Authentication works. Routing is correct. Major workflows do not explode when exercised end to end.</p>
<p>If you find yourself relying on end-to-end tests to validate business logic, that is usually a sign that something is missing at a lower level. You are compensating for gaps in slice tests, module integration tests, or contract tests. That is a smell. End-to-end tests are expensive and blunt instruments. Used sparingly, they provide confidence. Used as a safety net for everything else, they slow teams down and hide the real problems.</p>
<h2 id="heading-the-testing-pyramid-rewritten">The Testing Pyramid Rewritten</h2>
<p>In practice, the balance looks very different from the traditional testing pyramid. In a modular monolith, the weight shifts toward the places where behaviour and boundaries actually live.</p>
<p>You want many slice tests, because they validate real use cases in isolation. You want a healthy number of module integration tests, because they confirm that modules cooperate the way you think they do. On top of that, you add a thin but deliberate layer of contract and architecture tests to protect agreements and structural integrity. End-to-end tests should exist, but only in very small numbers.</p>
<p>If your test suite is inverted, with lots of end-to-end tests propping up a weak foundation, something upstream is wrong. Tests are compensating for missing confidence elsewhere, and that is usually a sign that the architecture itself needs attention.</p>
<h2 id="heading-tests-as-architectural-pressure">Tests as Architectural Pressure</h2>
<p>Here’s something that doesn’t get said often enough:</p>
<blockquote>
<p>If something is hard to test, it’s probably badly designed.</p>
</blockquote>
<p>Modular monoliths surface this fast.</p>
<p>If you find yourself struggling to test:</p>
<ul>
<li><p>A slice</p>
</li>
<li><p>A module boundary</p>
</li>
<li><p>A permission rule</p>
</li>
</ul>
<p>That struggle is feedback.</p>
<p>Ignore it and you’ll pay later.<br />Listen to it and the design improves.</p>
<h2 id="heading-where-we-are-now">Where We Are Now</h2>
<p>At this point in the series, you have:</p>
<ul>
<li><p>Real module boundaries</p>
</li>
<li><p>Feature-centric design</p>
</li>
<li><p>Isolated data</p>
</li>
<li><p>Honest communication</p>
</li>
<li><p>Localised authorisation</p>
</li>
<li><p>A testing strategy that reinforces all of it</p>
</li>
</ul>
<p>What’s left is the question everyone eventually asks.</p>
<blockquote>
<p>How do you know when a modular monolith is ready to split?</p>
</blockquote>
<p>That’s not a technical question. It’s a judgment call</p>
]]></content:encoded></item><item><title><![CDATA[Part 5. Authorisation as a Cross-Cutting Concern Without Leaking Modules]]></title><description><![CDATA[If you’ve followed this series so far, you’ve done the hard structural work.
You’ve enforced real module boundaries.You’ve organised behaviour into vertical slices.You’ve isolated data with separate DbContexts.You’ve stopped modules from chatting lik...]]></description><link>https://fullstackcity.com/part-5-authorisation-as-a-cross-cutting-concern-without-leaking-modules</link><guid isPermaLink="true">https://fullstackcity.com/part-5-authorisation-as-a-cross-cutting-concern-without-leaking-modules</guid><category><![CDATA[Modular Monolith]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 08:13:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769063402071/edbfb107-2ce3-4fd9-90fe-8d9b1e7f5202.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you’ve followed this series so far, you’ve done the hard structural work.</p>
<p>You’ve enforced real module boundaries.<br />You’ve organised behaviour into vertical slices.<br />You’ve isolated data with separate DbContexts.<br />You’ve stopped modules from chatting like overfriendly neighbours.</p>
<p>And then you hit authorisation.</p>
<p>Suddenly, every nice boundary you worked so hard to protect feels under threat.</p>
<p>Permissions need to be checked everywhere.<br />Identity is shared.<br />Policies feel global.<br />This is where most modular monoliths start leaking.</p>
<h2 id="heading-why-authorisation-is-so-dangerous-architecturally">Why Authorisation Is So Dangerous Architecturally</h2>
<p>Authorisation is one of the most deceptive parts of a system from an architectural point of view. It looks like a technical concern, something you can solve once and move on from, but in reality it is deeply business-driven. Every permission encodes a rule about who is allowed to do what, when, and why, and those rules evolve as the business evolves.</p>
<p>Because of that, authorisation cuts across modules. It touches almost every request. It changes over time. And it has a habit of attracting shortcuts, especially under pressure. Those shortcuts often sound reasonable in the moment. Let’s just inject a permission service everywhere. We will centralise all checks in one place. It is fine if modules know about roles.</p>
<p>That is how you end up with a system where every module depends on a shared authorisation core. Permission logic gets smeared across handlers and services. Changing a single rule means touching half the codebase. The modules may still compile separately, but architecturally they are glued together, and the boundaries you thought you had are no longer doing any real work.</p>
<h2 id="heading-the-first-principle-authorisation-is-not-ownership">The First Principle: Authorisation Is Not Ownership</h2>
<p>The biggest mental shift that makes this work is understanding that a module does not own authorisation. A module owns its rules. That distinction is subtle, but it is foundational.</p>
<p>Identity, tokens, and enforcement are cross-cutting concerns. They are about how a decision is applied and verified. Meaning is not cross-cutting. Meaning lives with the business capability that understands it. A Billing module knows what it means to issue an invoice. A Users module knows what it means to deactivate a user. A Claims module knows what it means to approve a claim.</p>
<p>If you centralise the meaning of permissions, you lose modularity instantly. The moment a single place decides what actions really mean across the system, every module starts to depend on that shared interpretation. Boundaries blur, ownership weakens, and what looked like a clean separation turns into another hidden point of coupling.</p>
<h2 id="heading-the-wrong-way-and-why-its-so-tempting">The Wrong Way (And Why It’s So Tempting)</h2>
<p>Here’s the classic mistake.</p>
<p>You create something like:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">class</span> <span class="hljs-title">Permissions</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">string</span> CreateUser = <span class="hljs-string">"users:create"</span>;
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">string</span> DeleteUser = <span class="hljs-string">"users:delete"</span>;
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">string</span> IssueInvoice = <span class="hljs-string">"billing:issue"</span>;
}
</code></pre>
<p>This pattern shows up quickly in otherwise well-structured systems. Someone introduces a shared permissions class with a set of constants, and every module references it. At first glance, it feels clean, centralised, and consistent. Everyone uses the same strings. There is one obvious place to look. It looks like order. In reality, every module now depends on a shared permissions model. The meaning of those permissions is no longer owned by the modules that enforce them, but by a central construct that everyone must agree on. When the semantics of a permission change, that change ripples outward across the entire system, whether the affected modules care about it or not.</p>
<p>What you have really created is a hidden coupling hub. The dependency may be subtle, but it is powerful. Modules that should be independent are now tied together through shared meaning and shared evolution. Architecturally, this is no better than a shared DbContext. It is just harder to see, and therefore easier to justify until the damage is already done.</p>
<h2 id="heading-cross-cutting">Cross-Cutting</h2>
<p>Cross-cutting does not mean globally defined. That is where a lot of designs quietly go wrong. Treating something as cross-cutting does not give it permission to own meaning for the entire system. In this context, cross-cutting means that enforcement is consistent. The mechanics of checking permissions, validating tokens, and rejecting unauthorised requests can and should be shared. Infrastructure can be reused. The plumbing does not need to be reinvented in every module.</p>
<p>Meaning, however, must stay local. Each module decides what a permission actually represents in terms of business behaviour. That distinction matters. When enforcement is shared but meaning is local, you get consistency without coupling, and that is the balance modular systems depend on.</p>
<h2 id="heading-the-shape-of-a-boundary-respecting-authorization-model">The Shape of a Boundary-Respecting Authorization Model</h2>
<p>Here’s the structure that holds up long-term:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769068778962/8f78c365-80d5-435c-8018-92684861b70d.png" alt class="image--center mx-auto" /></p>
<p>In a modular system, each module declares its own permissions. It knows what those permissions mean, how they map to business rules, and when they should apply. Those permissions are used internally, close to the behaviour they protect, where their meaning is clear and unlikely to be misinterpreted.</p>
<p>The platform plays a different role. It authenticates users, resolves claims, and enforces checks in a consistent way. It provides the mechanism, not the meaning. No single module owns the whole picture, and thats by design. Meaning stays local to the module that understands it, while enforcement remains shared and predictable. That separation is what keeps authorisation powerful without turning it into another hidden source of coupling.</p>
<h2 id="heading-permissions-are-part-of-the-use-case">Permissions Are Part of the Use Case</h2>
<p>A permission is not a generic rule. It’s part of a specific behaviour.</p>
<p>That means it belongs with the slice.</p>
<p>For example, in a Users module:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">internal</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">class</span> <span class="hljs-title">UserPermissions</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">string</span> Create = <span class="hljs-string">"users:create"</span>;
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">const</span> <span class="hljs-keyword">string</span> Deactivate = <span class="hljs-string">"users:deactivate"</span>;
}
</code></pre>
<p>Used directly in the slice that cares about it.</p>
<pre><code class="lang-csharp">[<span class="hljs-meta">Authorize(Policy = UserPermissions.Create)</span>]
<span class="hljs-keyword">internal</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">CreateUserEndpoint</span> : <span class="hljs-title">IEndpoint</span>
{
    <span class="hljs-comment">// endpoint mapping</span>
}
</code></pre>
<p>This permission:</p>
<ul>
<li><p>Is declared by Users</p>
</li>
<li><p>Is used by Users</p>
</li>
<li><p>Is meaningful only in Users</p>
</li>
</ul>
<p>Billing doesn’t care. It shouldn’t even know it exists.</p>
<h2 id="heading-policies-are-infrastructure-not-business-logic">Policies Are Infrastructure, Not Business Logic</h2>
<p>This is another common source of leakage.</p>
<p>Policies often become a dumping ground for logic:</p>
<pre><code class="lang-csharp">services.AddAuthorization(options =&gt;
{
    options.AddPolicy(<span class="hljs-string">"CanCreateUser"</span>, policy =&gt;
        policy.RequireClaim(<span class="hljs-string">"role"</span>, <span class="hljs-string">"Admin"</span>));
});
</code></pre>
<p>This approach looks innocent at first. Then roles change. Rules evolve. Context starts to matter. Before long, business logic is no longer living in the modules where it belongs, it is living in startup configuration and policy wiring, far away from the actual use cases that give it meaning. The better approach is a clean separation of responsibility. Policies should be mechanical. They exist to enforce checks, not to encode business rules. Meaning should live inside the modules, close to the behaviour it governs. Claims should represent capabilities, not roles or organisational structures.</p>
<p>A policy should answer a simple question:”Does this principal have permission X?” It should not be trying to answer something like “Is this user an admin in department Y during business hours?” That kind of logic depends on context, intent, and business rules, and it belongs near the use case where that context actually exists, not buried inside infrastructure configuration.</p>
<h2 id="heading-claims-as-capabilities-not-roles">Claims as Capabilities, Not Roles</h2>
<p>Roles are blunt instruments. They don’t scale well once systems grow.</p>
<p>Capabilities do.</p>
<p>Instead of:</p>
<pre><code class="lang-csharp">Role: Admin
</code></pre>
<p>Think:</p>
<pre><code class="lang-csharp">Permission: users:create
Permission: billing:issue
Permission: claims:approve
</code></pre>
<p>Your identity system emits capabilities.<br />Your modules decide what those capabilities mean.</p>
<p>This keeps the identity layer dumb and the domain layer expressive.</p>
<h2 id="heading-avoiding-the-authorisation-service-anti-pattern">Avoiding the “Authorisation Service” Anti-Pattern</h2>
<p>Another trap is the so-called “Authorisation Service”.</p>
<p>A shared service with methods like:</p>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">bool</span> <span class="hljs-title">CanCreateUser</span>(<span class="hljs-params">User user</span>)</span>;
<span class="hljs-function"><span class="hljs-keyword">bool</span> <span class="hljs-title">CanIssueInvoice</span>(<span class="hljs-params">User user</span>)</span>;
</code></pre>
<p>This kind of design often looks reusable at first glance. In reality, it is catastrophic for modularity. The problem is not the intent, it is the effect.</p>
<p>It centralises business rules that should belong to individual modules. It forces every module to depend on that central logic. And over time, it becomes a change bottleneck, because even small rule adjustments ripple through the entire system.</p>
<p>If a rule is about Users, it belongs in the Users module. If a rule is about Billing, it belongs in the Billing module. That alignment keeps ownership clear and change local.</p>
<p>Cross-cutting enforcement does not mean cross-cutting logic. Confusing the two is how systems quietly lose their modularity while still appearing well-structured on the surface.</p>
<h2 id="heading-where-authorisation-checks-actually-belong">Where Authorisation Checks Actually Belong</h2>
<p>In a vertical-slice system, authorisation checks belong at the boundary of the slice. That is the point where intent is clearest, where you can say, “this operation is about to happen”, and make a deliberate decision about whether it should be allowed.</p>
<p>They should not be buried in repositories, where they become implicit and easy to bypass. They should not be hidden inside services, where they turn into invisible rules that only exist if you remember to call the right method. And they should not be scattered across helper utilities, where consistency depends on developer discipline. Put the check where the request enters the slice, at the point where the behaviour is named and explicit. That is where the system can tell the truth about what it is doing, and where authorisation can be enforced predictably without leaking into the wrong places.</p>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;Result&gt; <span class="hljs-title">Handle</span>(<span class="hljs-params">
    CreateUserCommand command,
    ClaimsPrincipal principal,
    CancellationToken stopToken</span>)</span>
{
    <span class="hljs-keyword">if</span> (!principal.HasPermission(UserPermissions.Create))
        <span class="hljs-keyword">return</span> Result.Forbidden();

    <span class="hljs-comment">// behaviour</span>
}
</code></pre>
<p>That’s explicit. Honest. Local.</p>
<p>If the rule changes, you know exactly where to look.</p>
<h2 id="heading-handling-shared-identity-without-shared-coupling">Handling Shared Identity Without Shared Coupling</h2>
<p>Identity is shared, and that part is not controversial. There is a single logged-in user and a single authentication story. But shared identity does not require shared interpretation. Each module maps that identity to meaning on its own terms, based on the business rules it owns. This is the difference between asking “who is this user?” and asking “what are they allowed to do here?” The first question is global and infrastructural. The second is local and contextual. Confusing the two is how modules start to leak meaning into places where it does not belong.</p>
<p>Authorisation mistakes rarely show up as obvious bugs. Instead, they surface as fear of change, overly defensive coding, centralised gatekeepers, and endless regressions. Teams slow down, not because the system is broken, but because nobody is confident that a change will not have unintended side effects. Systems that age well avoid that trap. They keep permission logic obvious, local, and boring. When authorisation is easy to understand and easy to change in one place, the system stays flexible, and the team keeps its momentum.</p>
<hr />
<h2 id="heading-up-next-in-the-series">Up Next in the Series</h2>
<p>The next inevitable question is:</p>
<blockquote>
<p><strong>How do you test all of this without writing brittle, end-to-end monsters or meaningless unit tests?</strong></p>
</blockquote>
<p>Testing modular monoliths requires a different mental model.</p>
<h3 id="heading-parts-6-10-herehttpsfullstackcitycomseriesbuilding-modular-monoliths-that-actually-scale2"><a target="_blank" href="https://fullstackcity.com/series/building-modular-monoliths-that-actually-scale2">PARTS 6 - 10 HERE</a></h3>
]]></content:encoded></item><item><title><![CDATA[Part 4. Inter-Module Communication Without Creating a Distributed Monolith]]></title><description><![CDATA[Once you’ve enforced boundaries, adopted vertical slices, and given each module its own DbContext, a new tension appears almost immediately.
Modules are now properly isolated. They own their data. They commit independently. They even fail independent...]]></description><link>https://fullstackcity.com/part-4-inter-module-communication-without-creating-a-distributed-monolith</link><guid isPermaLink="true">https://fullstackcity.com/part-4-inter-module-communication-without-creating-a-distributed-monolith</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[.NET]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Thu, 22 Jan 2026 00:06:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769038667518/c86169ae-d397-4bb0-b9fc-acff1a8ee070.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Once you’ve enforced boundaries, adopted vertical slices, and given each module its own DbContext, a new tension appears almost immediately.</p>
<p>Modules are now properly isolated. They own their data. They commit independently. They even fail independently.</p>
<p>And then you hit the next question:</p>
<blockquote>
<p>How do these things actually talk to each other?</p>
</blockquote>
<p>This is the point where a lot of otherwise solid modular monoliths quietly fall apart. Not all at once. Slowly. Politely. One innocent decision at a time.</p>
<p>A direct method call here.<br />A shared DTO there.<br />A synchronous dependency that “should be fine”.</p>
<p>Before long, you haven’t broken any rules explicitly, but you’ve recreated the very thing you were trying to escape.</p>
<h2 id="heading-the-trap-treating-modules-like-classes">The Trap: Treating Modules Like Classes</h2>
<p>The most common mistake I see is treating modules like they are just big classes. The reasoning usually sounds harmless enough. They are in the same process, so why not just call across and get the answer directly? From a technical point of view, that is correct. From an architectural point of view, it is disastrous. The moment one module calls another synchronously and expects an immediate answer, a chain of coupling is introduced, whether you intended it or not. Execution order becomes fixed. Failure modes bleed across boundaries. Time suddenly matters in ways you did not plan for, and assumptions about internal behaviour start to leak out. What looked like a clean boundary turns into a thin veil. The module may still exist on the filesystem, but its autonomy is gone.</p>
<h2 id="heading-what-youre-actually-trying-to-preserve">What You’re Actually Trying to Preserve</h2>
<p>Before choosing any communication mechanism, it helps to be clear about what you are actually trying to preserve. The goal is not elegance or convenience in the short term. It is about protecting a set of properties that make the system resilient over time. You are trying to preserve autonomy, so each module can make its own decisions without being dragged into someone else’s execution flow. You are trying to preserve replaceability, so modules can evolve, change shape, or even be rewritten without forcing a cascade of changes elsewhere. You are trying to preserve honest failure, where problems surface clearly instead of being buried inside a larger operation. And you are trying to preserve extractability, so future you still has real options when the system needs to change.</p>
<p>If a communication style undermines any of those, it is the wrong choice, no matter how convenient or familiar it feels today. Convenience fades quickly. The consequences of broken boundaries tend to stick around much longer.</p>
<h2 id="heading-three-ways-modules-can-communicate">Three Ways Modules Can Communicate</h2>
<p>In a modular monolith, there are really only three legitimate ways modules should talk to each other.</p>
<p>Everything else is a variation or a mistake.</p>
<h3 id="heading-1-asking-a-question-synchronous-contract-only">1. Asking a Question (Synchronous, Contract-Only)</h3>
<p>Sometimes a module genuinely needs to ask another module a question. Not to delegate work, and not to coordinate behaviour, but simply to retrieve information that the other module owns. This is the narrowest and safest form of synchronous communication you can allow. In cases like this, the intent is clear. Billing might need to know whether a user exists. An authorisation module might need to check permissions. A reporting feature might need a snapshot of reference data. In each case, the caller is asking for information, not trying to drive the other module’s behaviour.</p>
<p>The constraints around this kind of interaction are non-negotiable. You depend on a contract, never an implementation. You accept that the call can fail and design accordingly. And you do not embed business flow or orchestration logic into the response. When those rules are respected, synchronous queries can exist without eroding the autonomy of the modules involved.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769039182056/25854d3e-16ff-47f6-b8ea-468283a233ee.png" alt class="image--center mx-auto" /></p>
<p>This is not orchestration. It’s lookup.</p>
<p>If the answer disappears tomorrow and you have to replace it with a cache or a projection, nothing fundamental breaks.</p>
<h3 id="heading-2-announcing-something-happened-asynchronous-event-driven">2. Announcing Something Happened (Asynchronous, Event-Driven)</h3>
<p>This is the most important pattern in the entire series.</p>
<p>When a module completes work it owns, it announces a fact, not an instruction.</p>
<blockquote>
<p>“A user was created.”<br />“An invoice was issued.”<br />“A policy was cancelled.”</p>
</blockquote>
<p>It does not care who reacts. It does not wait. It does not coordinate.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769039236360/fce98105-3565-405f-b590-e99c77f4e42f.png" alt class="image--center mx-auto" /></p>
<p>This preserves autonomy better than anything else you can do.</p>
<p>Each module:</p>
<ul>
<li><p>Decides if it cares</p>
</li>
<li><p>Handles the event in its own time</p>
</li>
<li><p>Fails independently</p>
</li>
</ul>
<p>If Billing is down, Users still works.<br />If Notifications breaks, Billing doesn’t care.</p>
<p>That’s not a compromise. That’s the design doing its job.</p>
<h3 id="heading-3-issuing-a-command-rare-explicit-dangerous">3. Issuing a Command (Rare, Explicit, Dangerous)</h3>
<p>This one should make you uncomfortable, and that discomfort is intentional. A command is not a notification and it is not a question. It is one module telling another module to do something. Not “something happened”, but “you must act”.</p>
<p>There are situations where this is genuinely necessary, but they should be rare. A command carries weight because it explicitly coordinates behaviour across boundaries. When you issue one, you are saying that your operation is not complete unless another module performs work on your behalf. That is a strong form of coupling, even when it is wrapped in a clean interface.</p>
<p>If you find yourself doing this often, your boundaries are lying to you. Either the modules are not as independent as you think, or the responsibility is split in the wrong place.</p>
<p>When commands are unavoidable, treat them with care. Make them explicit so their intent is obvious. Make them intentional so they are not introduced casually. And above all, make them rare, because every command chips away at the autonomy you are trying to preserve.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769039362153/6f973e16-839a-40a1-be58-080b20f6e380.png" alt class="image--center mx-auto" /></p>
<p>If this starts to feel like a workflow engine, that’s because it is. At that point, you should acknowledge it and design accordingly, not pretend it’s “just a call”.</p>
<h2 id="heading-the-illusion-of-safety-in-synchronous-chains">The Illusion of Safety in Synchronous Chains</h2>
<p>Here’s the pattern that causes the most damage:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769039438265/08517997-9aa5-4b03-99e5-5b1cf2c08627.png" alt class="image--center mx-auto" /></p>
<p>On a diagram, this kind of flow looks neat. It is linear, predictable, and easy to explain. One box calls the next, work flows left to right, and everything appears nicely ordered.</p>
<p>In reality, a very different set of behaviours emerges. Latency stacks as each call waits on the next. Failures cascade across boundaries. Retries multiply in unexpected ways. Partial success becomes invisible because everything is hidden behind synchronous calls and assumptions of immediacy.</p>
<p>What you have really created is a distributed transaction in disguise, but without any of the tooling, visibility, or honesty that distributed systems demand. And worse, you have done it inside a monolith, where nobody is watching for those failure modes because the architecture claims they do not exist.</p>
<h2 id="heading-why-events-feel-uncomfortable-at-first">Why Events Feel Uncomfortable at First</h2>
<p>Events tend to feel uncomfortable at first because most developers are trained to think in terms of control flow. You call this, then you call that, and if something goes wrong you roll everything back. The path is explicit, linear, and easy to follow in your head.</p>
<p>Events break that mental model. When you publish an event, you do not know who will react to it, when they will react, or in what order. That loss of immediate control can feel dangerous, especially if you are used to relying on transactions to keep everything tidy.</p>
<p>What actually changes is where correctness lives. Instead of being enforced by a single transactional boundary, correctness is enforced through idempotency, retries, observability, and explicit state management. These mechanisms are harder to fake and harder to ignore. They force you to deal with reality rather than hiding it behind rollback semantics, and that honesty is exactly what makes event-driven designs robust over time.</p>
<h2 id="heading-in-process-messaging-is-not-a-shortcut">In-Process Messaging Is Not a Shortcut</h2>
<p>In-process messaging is often misunderstood as a shortcut. Developers reach for a messaging library and tell themselves that it is just method calls with a bus in the middle. That assumption is where the trouble starts.</p>
<p>The bus is not an implementation detail. The bus is the boundary. If you treat it as invisible, you will design handlers that quietly rely on immediate execution, guaranteed ordering, and the absence of failure. Those assumptions hold only as long as everything stays exactly where it is.</p>
<p>The moment you move that bus out of process, all of those hidden assumptions surface at once, and things start breaking in surprising ways. The system was never designed for the realities it now has to face.</p>
<p>The right mental model is to design your in-process messaging as if it were already remote. Assume latency. Assume failure. Assume retries and reordering. If you do that, extracting the messaging infrastructure later becomes a mechanical change rather than a fundamental redesign, and the boundaries you established early continue to hold.</p>
<h2 id="heading-a-rule-that-saves-you-years-later">A Rule That Saves You Years Later</h2>
<p>Heres a rule I have learned to trust over time. If a module needs to wait for another module to finish work, you probably have the wrong boundary. Not always, but often enough that it is worth pausing and questioning the design when it happens. Waiting implies coordination. Coordination implies shared responsibility. And shared responsibility implies coupling. Each step pulls the modules closer together, even if the code still looks clean on the surface.</p>
<p>When you notice this pattern, it is a signal to slow down and reassess. Either the work truly belongs in the same module, or the interaction should be reshaped so that one module can proceed independently. Catching this early can save you years of friction later on.</p>
<h2 id="heading-living-with-partial-failure">Living With Partial Failure</h2>
<p>This is the part most architectures try hard to avoid. Partial failure feels messy and uncomfortable, so the instinct is to design it away. In reality, partial failure is normal. One module succeeds, another fails, and the system continues to run. When that happens, the response should be deliberate. You log what happened. You retry where it makes sense. You compensate if the business process requires it. And, crucially, you observe it so you can understand how often it occurs and why. What you do not do is hide that failure inside a transaction and pretend it never happened. That illusion might hold for a while, but it always leaks eventually, and it almost always leaks at the worst possible time, when the system is under pressure and the cost of surprises is highest.</p>
<h2 id="heading-why">Why</h2>
<p>I dont want systems where I have to remember invisible rules. I do not want to rely on assumptions like “this always happens before that” when there is nothing in the system actually enforcing it. When I come back to a codebase after a break, or late at night when my brain is half-fried, I want the behaviour to be explicit. I want communication patterns that tell the truth about dependencies instead of hiding them behind convention or tribal knowledge. Modules that announce facts and react independently are easier to reason about. They are easier to monitor, easier to debug, and easier to evolve without fear. And they age better, which is something that matters far more than most people like to admit when the system is still young.</p>
<h2 id="heading-bringing-it-all-together">Bringing It All Together</h2>
<p>So far in the series, we’ve established that:</p>
<ul>
<li><p>Boundaries must be enforced, not documented</p>
</li>
<li><p>Behaviour belongs in vertical slices</p>
</li>
<li><p>Data must be owned per module</p>
</li>
<li><p>Communication must preserve autonomy</p>
</li>
</ul>
<p>If you skip any one of these, the others weaken.</p>
<p>Get them all roughly right, and you end up with something rare, a monolith that doesn’t rot as it grows.</p>
<hr />
<h2 id="heading-up-next-in-the-series">Up Next in the Series</h2>
<p>Now that modules can talk safely, the next problem shows up immediately:</p>
<blockquote>
<p><strong>How do you apply cross-cutting concerns like authorisation without blowing holes through your boundaries?</strong></p>
</blockquote>
<p>Permissions, policies, and identity have a habit of leaking everywhere if you’re not careful.</p>
]]></content:encoded></item><item><title><![CDATA[Part 3. Multiple DbContexts per Module Without Breaking Transactions]]></title><description><![CDATA[By the time you’ve enforced real module boundaries and organised behaviour using vertical slices, you run head-first into the next uncomfortable question:

If modules are truly independent, why are they still sharing a DbContext?

This is where most ...]]></description><link>https://fullstackcity.com/part-3-multiple-dbcontexts-per-module-without-breaking-transactions</link><guid isPermaLink="true">https://fullstackcity.com/part-3-multiple-dbcontexts-per-module-without-breaking-transactions</guid><category><![CDATA[Modular Monolith]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Wed, 21 Jan 2026 23:32:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769038258382/c9a40bab-69bb-47b6-b8ab-f2149621500f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>By the time you’ve enforced real module boundaries and organised behaviour using vertical slices, you run head-first into the next uncomfortable question:</p>
<blockquote>
<p>If modules are truly independent, why are they still sharing a DbContext?</p>
</blockquote>
<p>This is where most modular monoliths quietly cheat.</p>
<p>They talk about ownership and boundaries, but behind the scenes everything funnels through a single EF Core DbContext, backed by a single schema, and stitched together with navigation properties that bleed across modules.</p>
<h2 id="heading-the-shared-dbcontext-lie">The Shared DbContext Lie</h2>
<p>Let’s call it what it is. A shared DbContext across modules is a lie, or at least a very convincing one. It claims that modules are independent while quietly giving them direct and implicit access to each other’s persistence concerns. On the surface everything looks clean, but underneath, the boundaries are already compromised. Once a shared DbContext exists, a familiar pattern starts to emerge. Someone adds a cross-module join “just this once” to save time. Navigation properties begin to grow tentacles, reaching into parts of the system they were never meant to know about. Performance tuning stops being a local concern and turns into a global exercise, because a change for one feature can ripple unpredictably across others.</p>
<p>The longer this goes on, the harder it becomes to reverse. When you eventually want to extract a module, whether into its own service or simply into a cleaner internal boundary, you discover that everything is entangled at the data level. What looked like a modular design turns out to be tightly coupled where it matters most.</p>
<p>At that point, you are not really building a modular monolith. You are building a monolith with folders. Its the architectural equivalent of what my Spanish friend says when I drift too far from his paella recipe. Once you start throwing in whatever is convenient, it stops being paella and becomes “arroz con cosas”, rice with things. It might still be edible, but it is no longer the thing you set out to make.</p>
<p>That is exactly what happens with a shared DbContext. The intent was modularity, but convenience takes over. What you end up with may still work, but it has quietly lost its identity, and undoing that damage later is far harder than getting it right up front.</p>
<h2 id="heading-what-is-data-ownership">What is Data Ownership?</h2>
<p>Data ownership is another area where the language sounds clear but the reality often gets blurred. If a module owns a business concept, then it must own everything that makes that concept real in the system. That includes the schema, the mappings, the persistence rules, and the full lifecycle of the data from creation to deletion. Anything less than that is shared ownership, and shared ownership is where boundaries quietly fall apart. There is no such thing as “mostly owns” or “owns it except for reporting”. The moment another module can shape, query, or optimise that data on its own terms, ownership is already compromised. The module may still be responsible for the concept in theory, but in practice the data has become a shared resource. The architectural consequence of real ownership is simple and uncomfortable for some teams, one DbContext per module. Not one per feature, and not multiple bounded contexts hiding inside the same module. One module, one persistence boundary. That is what makes ownership explicit, enforceable, and durable as the system grows.</p>
<h2 id="heading-the-immediate-pushback">The Immediate Pushback</h2>
<p>The pushback comes immediately, and to be fair, it is justified. The moment you say “multiple DbContexts”, the same questions surface almost every time. What about transactions? What if a single operation needs to update two modules? Isn’t this just distributed systems inside a single process?</p>
<p>These are good questions, and they are honest ones. They usually come from people who have been burned by data inconsistency or partial failures before, and who are rightly cautious about introducing new failure modes.</p>
<p>The mistake is not in asking those questions. The mistake is in trying to answer them with the wrong tools.</p>
<h2 id="heading-what-a-multi-dbcontext-modular-monolith-looks-like">What a Multi-DbContext Modular Monolith Looks Like</h2>
<p>At a structural level, it’s simple.</p>
<pre><code class="lang-csharp">Users.Module
  UsersDbContext

Billing.Module
  BillingDbContext
</code></pre>
<p>Each DbContext:</p>
<ul>
<li><p>Lives inside its module</p>
</li>
<li><p>Is internal to that module</p>
</li>
<li><p>Only knows about its own aggregates</p>
</li>
</ul>
<p>No shared base DbContext.<br />No shared migrations.<br />No shared entity configurations.</p>
<p>Here’s the mental model:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769036683451/83e7d5bc-d69e-4c9b-b296-bec5f93731bf.png" alt class="image--center mx-auto" /></p>
<p>Same physical database if you want. Different schemas. Different contexts. Different ownership.</p>
<h2 id="heading-but-i-need-a-transaction-across-modules">“But I Need a Transaction Across Modules”</h2>
<p>This is the point where most people reach for the wrong answer. Faced with the idea of changing state in more than one module, the instinct is to try to stretch a transaction across the boundary and make the problem go away. The tooling makes this feel tempting, even safe, and there is always some variation of “the framework can handle it if we’re careful enough”.</p>
<p>And technically, some of that does work. You can coordinate multiple persistence operations, get everything to commit or roll back together, and walk away with a sense of consistency. On the surface, it looks like the cleanest solution.</p>
<p>Architecturally, it is a trap. The moment you rely on a shared transactional boundary, you have effectively collapsed the modules back into one. The boundary still exists in name, but not in behaviour. What you gain in short-term convenience, you lose in long-term modularity, flexibility, and the ability to evolve the system without fear.</p>
<h2 id="heading-why-cross-module-transactions-are-a-smell">Why Cross-Module Transactions Are a Smell</h2>
<p>Here’s the uncomfortable truth. If two modules must commit atomically, then they are not independent. No amount of layering or careful naming changes that reality. Atomic consistency is a stronger signal than any diagram you can draw. That does not automatically mean your design is bad. It means the boundary is wrong. What you have separated conceptually does not match how the business actually needs the system to behave. The architecture is telling you something, and it is usually worth listening.</p>
<p>In practice, there are only two real possibilities. Either the modules are genuinely part of the same consistency boundary and should be treated as such, or the operation does not truly require atomic consistency and you are reaching for it out of convenience. Most systems blur this line, choosing the comfort of transactions instead of questioning whether the boundary itself makes sense.</p>
<h2 id="heading-strong-consistency-is-rarely-needed">Strong Consistency Is Rarely Needed</h2>
<p>Imagine a Users module and a Billing module. You create a user and, as part of that flow, you also create a billing profile. The reflex is to assume that both of those actions must succeed or fail together, wrapped in a single atomic transaction. But step back and ask what actually matters. The user needs to exist. Billing needs to know about that user. That is the real requirement. If billing finds out about the new user 50 milliseconds later, nothing meaningful breaks. There is no business catastrophe hiding in that gap.</p>
<p>This is where strong consistency quietly reveals itself as optional rather than mandatory. Inside a monolith, eventual consistency is not a compromise or a failure of design. In many cases, it is the more honest reflection of the business process. By allowing modules to communicate asynchronously and converge on consistency over time, you preserve boundaries, reduce coupling, and end up with a system that is easier to reason about as it grows.</p>
<h2 id="heading-the-correct-pattern-local-transactions-events">The Correct Pattern: Local Transactions + Events</h2>
<p>The correct pattern is much simpler than it first appears, local transactions combined with events. Inside a module, nothing exotic is required. You use a normal EF Core transaction, make your changes, and commit them. The module stays fully in control of its own data and its own consistency. Once that work is complete, the module publishes an event to say that something meaningful has happened. That event is not an implementation detail, it is a deliberate signal to the rest of the system. It represents the only sanctioned way for other modules to react.</p>
<p>That event becomes the boundary crossing point. Other modules can listen, respond, and update their own state in their own time, using their own transactions. Consistency is achieved through coordination rather than coupling, and the integrity of each module’s boundary remains intact.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769037794720/e74c2d26-9e22-4df6-a8e7-07fab656d40e.png" alt class="image--center mx-auto" /></p>
<p>No shared DbContext.<br />No distributed transaction.<br />No lies.</p>
<h2 id="heading-but-what-if-billing-fails">“But What If Billing Fails?”</h2>
<p>Good. Now we’re talking about reality. Failure is not an edge case, it is the normal state of complex systems. This is exactly the point where layered monoliths tend to hide that reality by forcing everything through a single transaction. It feels safe because nothing appears to fail, but it is also brittle, because all failure is collapsed into one silent rollback.</p>
<p>With a modular approach, failure becomes explicit. When modules communicate through events, you do not pretend that everything always succeeds. You design for the fact that things can and will fail. Recovery becomes intentional rather than accidental, and system state becomes observable instead of being hidden behind a transaction boundary. If Billing fails, the user still exists. The event that announced the new user can be retried. The failure is visible, traceable, and something you can respond to. That is not a step backwards. It is an honest representation of how the system actually behaves, and honesty is what gives you control when things go wrong.</p>
<h2 id="heading-the-outbox-pattern">The Outbox Pattern</h2>
<p>Inside the Users module:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">using</span> <span class="hljs-keyword">var</span> tx = <span class="hljs-keyword">await</span> db.Database.BeginTransactionAsync(stopToken);

db.Users.Add(user);
db.Outbox.Add(<span class="hljs-keyword">new</span> OutboxMessage(
    <span class="hljs-string">"UserCreated"</span>,
    <span class="hljs-keyword">new</span> UserCreatedEvent(user.Id)));

<span class="hljs-keyword">await</span> db.SaveChangesAsync(stopToken);
<span class="hljs-keyword">await</span> tx.CommitAsync(stopToken);
</code></pre>
<p>Billing never touches Users’ DbContext.<br />It just reacts to what Users emits.</p>
<p>This is the same pattern that lets you split later without rewriting everything.</p>
<h2 id="heading-what-about-read-models">What About Read Models?</h2>
<p>Another common objection shows up quickly. “But I need to join Users and Billing for queries”. This is usually framed as a hard requirement, but it is really a question about reads, not ownership. Reads do not define ownership. Writes do. The fact that you want to view data together does not mean it should be stored or managed together. Conflating the two is how boundaries get eroded under the guise of convenience. If you genuinely need a combined view, the answer is a read model. You build a projection that listens to the relevant events, materialise a model that is shaped for querying, and optimise it for the questions you actually need to answer. That model can live wherever it makes the most sense, without punching holes through module boundaries. Yes, this means duplication. And yes, that duplication is intentional. Data is copied because it serves a different purpose. It is fine. In fact, it is often the cleanest way to keep writes honest while still giving reads the flexibility they need.</p>
<hr />
<h2 id="heading-up-next-in-the-series">Up Next in the Series</h2>
<p>Now that we’ve:</p>
<ol>
<li><p>Enforced boundaries</p>
</li>
<li><p>Structured behaviour</p>
</li>
<li><p>Isolated data</p>
</li>
</ol>
<p>The next problem shows up immediately:</p>
<blockquote>
<p><strong>How do modules talk to each other without turning into a distributed monolith?</strong></p>
</blockquote>
<p>That’s where in-process messaging, contracts, and intent-driven communication come in.</p>
]]></content:encoded></item><item><title><![CDATA[Part 2. Vertical Slice Architecture Inside a Modular Monolith]]></title><description><![CDATA[In the first post of this series, I talked about enforcing real module boundaries. Assemblies. Internals. Contracts. The boring but essential stuff that stops your modular monolith collapsing into a shared-nothing-in-name-only mess.
But once you’ve d...]]></description><link>https://fullstackcity.com/2-vertical-slice-architecture-inside-a-modular-monolith</link><guid isPermaLink="true">https://fullstackcity.com/2-vertical-slice-architecture-inside-a-modular-monolith</guid><category><![CDATA[Modular Monolith]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Wed, 21 Jan 2026 22:34:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769033580415/947378eb-7ea8-4367-aee4-ca03ff3fbfc5.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the first post of this series, I talked about enforcing real module boundaries. Assemblies. Internals. Contracts. The boring but essential stuff that stops your modular monolith collapsing into a shared-nothing-in-name-only mess.</p>
<p>But once you’ve done that work, something else becomes painfully obvious.</p>
<p>You open a module and… it’s still chaos.</p>
<p>Not architectural chaos. Organised chaos. The kind that looks tidy at first glance but slows you down every time you touch it. Folders called Controllers, Services, Repositories. Logic smeared across three or four layers. You change a feature and end up hopping between files like you’re playing whack-a-mole.</p>
<p>This is where most modular monoliths stall.</p>
<p>They have boundaries, but <strong>inside those boundaries they’re still layered</strong>.</p>
<p>And layered architecture optimises for one thing only, explaining code to a diagram. It does not optimise for change.</p>
<h2 id="heading-the-real-unit-of-change-is-a-feature">The Real Unit of Change Is a Feature</h2>
<p>Here’s the core idea behind vertical slice architecture, and it’s deceptively simple:</p>
<blockquote>
<p><strong>Code should be organised around things that change together.</strong></p>
</blockquote>
<p>Not around technical concerns. Not around frameworks. Around <strong>features</strong>.</p>
<p>When a product owner asks for a change, they don’t ask you to “update the service layer and adjust the repository abstraction”. They ask you to <em>change behaviour</em>. That behaviour should live in one place.</p>
<p>When I’m deep in a system late at night, after the house has finally gone quiet, I don’t want to reconstruct a feature in my head from five folders. I want to open one place and see the whole story.</p>
<p>Vertical slices give you that.</p>
<h2 id="heading-what-layered-architecture-looks-like-in-practice">What Layered Architecture Looks Like in Practice</h2>
<p>Most of us have built this:</p>
<pre><code class="lang-bash">Users/
  Controllers/
    UsersController.cs
  Services/
    UserService.cs
  Repositories/
    UserRepository.cs
  Models/
    User.cs
</code></pre>
<p>On paper, it looks reasonable. Responsibilities are separated. Everything has a place.</p>
<p>In reality, a single endpoint touches all of it.</p>
<p>Change the behaviour of “Create User” and you’re editing:</p>
<ul>
<li><p>A controller</p>
</li>
<li><p>A service</p>
</li>
<li><p>A repository</p>
</li>
<li><p>Possibly a validator</p>
</li>
<li><p>Possibly a mapper</p>
</li>
</ul>
<p>The logic is scattered. The mental overhead is high. And the risk of breaking something unrelated creeps up over time.</p>
<hr />
<h2 id="heading-what-a-vertical-slice-actually-is">What a Vertical Slice Actually Is</h2>
<p>A vertical slice is <strong>everything needed for one use-case</strong>, grouped together.</p>
<p>Not just the handler. Not just the endpoint. Everything.</p>
<p>Here’s the same Users module, sliced vertically:</p>
<pre><code class="lang-bash">Users/
  CreateUser/
    CreateUserEndpoint.cs
    CreateUserHandler.cs
    CreateUserCommand.cs
    CreateUserValidator.cs
  GetUser/
    GetUserEndpoint.cs
    GetUserHandler.cs
    GetUserQuery.cs
  UsersModule.cs
</code></pre>
<p>Each folder answers a single question:</p>
<blockquote>
<p>“How does this feature work?”</p>
</blockquote>
<p>You don’t jump around. You don’t search across the solution. You stay inside the slice.</p>
<h2 id="heading-vertical-slices-inside-a-module-not-instead-of-modules">Vertical Slices Inside a Module (Not Instead of Modules)</h2>
<p>Vertical slice architecture does <strong>not</strong> replace modular architecture. It lives <em>inside</em> it.</p>
<p>The hierarchy looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769033884098/6ddb4c92-0a6d-43c3-940b-5b880786056a.png" alt class="image--center mx-auto" /></p>
<p>Modules define <strong>ownership and boundaries</strong>.<br />Slices define <strong>behaviour and flow</strong>.</p>
<p>If you skip modules and go straight to slices, you end up with feature soup. If you skip slices and stick to layers, you end up with tightly coupled sludge.</p>
<p>You need both.</p>
<h2 id="heading-one-request-one-handler-one-path">One Request, One Handler, One Path</h2>
<p>A rule I follow almost religiously:</p>
<blockquote>
<p><strong>One request = one handler = one execution path</strong></p>
</blockquote>
<p>No shared “UserService” with 17 methods. No god-objects. No orchestration hidden behind abstractions.</p>
<p>Here’s a simple example.</p>
<h3 id="heading-the-request">The request</h3>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">CreateUserCommand</span>(<span class="hljs-params">
    <span class="hljs-keyword">string</span> Email,
    <span class="hljs-keyword">string</span> DisplayName
</span>)</span>;
</code></pre>
<h3 id="heading-the-handler">The handler</h3>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">internal</span> <span class="hljs-keyword">sealed</span> class <span class="hljs-title">CreateUserHandler</span>(<span class="hljs-params">
    IUserDbContext db,
    IClock clock
</span>)</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;Result&lt;Guid&gt;&gt; Handle(
        CreateUserCommand command,
        CancellationToken ct)
    {
        <span class="hljs-keyword">var</span> user = <span class="hljs-keyword">new</span> User(
            Guid.NewGuid(),
            command.Email,
            command.DisplayName,
            clock.UtcNow);

        db.Users.Add(user);
        <span class="hljs-keyword">await</span> db.SaveChangesAsync(ct);

        <span class="hljs-keyword">return</span> Result.Success(user.Id);
    }
}
</code></pre>
<p>No layers. No indirection. The behaviour is right there.</p>
<p>If validation fails, it fails here. If persistence changes, it changes here. If rules evolve, this is the file you open.</p>
<h2 id="heading-but-wont-this-duplicate-code">“But Won’t This Duplicate Code?”</h2>
<p>Yes.</p>
<p>Sometimes.</p>
<p>And that’s fine.</p>
<p>Layered architecture optimises for <strong>reuse</strong>. Vertical slices optimise for <strong>clarity and change</strong>.</p>
<p>If two features genuinely share logic, extract it. But don’t pre-emptively abstract “just in case”. That’s how shared services become dumping grounds.</p>
<p>I’ve learned to trust duplication far more than premature reuse. Duplication is obvious. Coupling is subtle.</p>
<h2 id="heading-validation-lives-with-the-slice">Validation Lives With the Slice</h2>
<p>Another mistake I see a lot is “centralised validation”.</p>
<p>It sounds sensible until you realise validation rules are <strong>part of the behaviour</strong>, not infrastructure.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">internal</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">CreateUserValidator</span>
    : <span class="hljs-title">AbstractValidator</span>&lt;<span class="hljs-title">CreateUserCommand</span>&gt;
{
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">CreateUserValidator</span>(<span class="hljs-params"></span>)</span>
    {
        RuleFor(x =&gt; x.Email)
            .NotEmpty()
            .EmailAddress();

        RuleFor(x =&gt; x.DisplayName)
            .NotEmpty()
            .MaximumLength(<span class="hljs-number">100</span>);
    }
}
</code></pre>
<p>This validator belongs <em>with</em> CreateUser. Not in a shared folder. Not in a generic pipeline that hides rules from the feature they apply to.</p>
<p>When a rule changes, you want to see it next to the behaviour it affects.</p>
<h2 id="heading-endpoints-are-just-adapters">Endpoints Are Just Adapters</h2>
<p>In a vertical slice, endpoints become very boring. That’s a good thing.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">internal</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">CreateUserEndpoint</span> : <span class="hljs-title">IEndpoint</span>
{
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">MapEndpoint</span>(<span class="hljs-params">IEndpointRouteBuilder app</span>)</span>
    {
        app.MapPost(<span class="hljs-string">"/users"</span>, <span class="hljs-keyword">async</span> (
            CreateUserCommand command,
            CreateUserHandler handler,
            CancellationToken stopToken) =&gt;
        {
            <span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> handler.Handle(command, stopToken);
            <span class="hljs-keyword">return</span> result.Match(
                dto =&gt; Results.Created(<span class="hljs-string">$"/users/<span class="hljs-subst">{id}</span>"</span>, dto),
                r =&gt; Results.BadRequest(r));
        });
    }
}
</code></pre>
<p>No logic. No branching. No decisions. The endpoint adapts HTTP to your slice and gets out of the way.</p>
<h2 id="heading-why-this-is-important-as-the-system-grows">Why This is important as the System Grows</h2>
<p>This is the part people consistently underestimate. Vertical slices do not feel revolutionary when the system is small and everyone still remembers how everything fits together. Early on, almost any structure feels workable. The payoff only becomes obvious months later, when the codebase has grown, the team has changed, and the original mental model has faded. As the system evolves, clean boundaries let you delete features without collateral damage. You can remove an entire slice and be confident that you have not silently broken unrelated behaviour elsewhere. Refactoring becomes safer because changes stay contained. You are working inside a known boundary instead of tiptoeing through a shared codebase, hoping nothing unexpected snaps. That same structure also keeps future options open. If a slice starts to demand independent scaling, ownership, or deployment, it is already shaped in a way that can be extracted into a service. You are not forced into that move, but you are not blocked by your architecture either. The decision becomes a trade-off, not a rescue mission.</p>
<p>On a more human level, this structure matters when you are under pressure, switching context between work and home life, or simply mentally tired. Clear slices reduce friction. You do not have to re-learn the entire system every time you touch it. You can step into one area, make a change, and step back out again with confidence.</p>
<p>That is the real long-term value. Future-you gets fewer surprises, fewer late-night debugging sessions, and a system that continues to make sense even after the original decisions are no longer fresh in your head.</p>
<h2 id="heading-vertical-slices-and-testing">Vertical Slices and Testing</h2>
<p>Testing looks very different once you commit to vertical slices, because the slice itself becomes the unit under test. You stop testing vague layers like “the service layer” or “the application layer” and start testing concrete behaviours. Instead of asking whether the system works in general, you ask whether CreateUser does exactly what it claims to do. Handler tests focus on behaviour. Given a valid request, does the handler produce the correct outcome and side effects? Validator tests are narrower and stricter. They exist to prove that the rules are enforced consistently, regardless of where the request comes from. Endpoint tests sit one level higher again, validating that routing, binding, authorisation, and wiring are correct.</p>
<p>The important part is what you do not need. There are no sprawling test fixtures that try to boot half the system. There are no magic mocks that exist only to satisfy an overly broad abstraction. Each test has a clear purpose and a tight scope, mirroring the structure of the slice itself. Because each slice stands on its own, the test suite stays readable as it grows. When a test fails, you know exactly where to look and why it matters. The structure of the code and the structure of the tests reinforce each other, which is exactly what you want in a system that is expected to evolve over time.</p>
<h2 id="heading-a-quiet-benefit-easier-onboarding">A Quiet Benefit: Easier Onboarding</h2>
<p>There is a quieter benefit to vertical slices that I did not fully appreciate at first, and that is onboarding. When someone new joins the team, the system does a surprising amount of the explaining for you. Instead of walking them through layers, conventions, and unwritten rules, you can point to a single feature and say, “everything for that behaviour is here”.</p>
<p>That simple framing removes the need for an architectural lecture on day one. A new developer does not have to build a mental map of the entire codebase before they can be productive. They can open one slice, see the endpoint, the handler, the validation, and the tests, and understand how the system works by following a real, concrete example. Thats more important than most people like to admit. Onboarding is where hidden complexity shows up fast, and it is also where architectural decisions either pay dividends or create friction. Vertical slices reduce that friction by making the structure of the system obvious, discoverable, and grounded in real behaviour rather than abstract rules.</p>
<h2 id="heading-bringing-it-back-to-reality">Bringing It Back to Reality</h2>
<p>I don’t build systems like this because it’s fashionable. I build them this way because I want to stay productive over the long haul. I have so many side projects going at any given time that it needs to be easy for me to switch contexts quickly and be able to get moving without learning the layout again after a break working on it.</p>
<p>Modules give you safety.<br />Vertical slices give you speed.</p>
<p>You need both.</p>
<hr />
<h2 id="heading-up-next-in-the-series">Up Next in the Series</h2>
<p>Now that we’ve:</p>
<ol>
<li><p>Enforced boundaries between modules</p>
</li>
<li><p>Structured behaviour inside modules</p>
</li>
</ol>
<p>The next hard problem shows up immediately:</p>
<blockquote>
<p><strong>How do you isolate data per module without breaking consistency?</strong></p>
</blockquote>
<p>That’s where multiple DbContexts, transactions, and reality collide.</p>
<p>That’s what we’ll tackle next.</p>
]]></content:encoded></item><item><title><![CDATA[Part 1. Enforcing True Module Boundaries in a .NET Modular Monolith]]></title><description><![CDATA[There’s a moment that happens to most of us at some point in our careers. You’ve done “everything right”. You’ve split the solution into folders called Modules. You’ve got namespaces that look clean. You might even have separate projects. And yet, si...]]></description><link>https://fullstackcity.com/part-1-enforcing-true-module-boundaries-in-a-net-modular-monolith</link><guid isPermaLink="true">https://fullstackcity.com/part-1-enforcing-true-module-boundaries-in-a-net-modular-monolith</guid><category><![CDATA[software architecture]]></category><category><![CDATA[Modular Monolith]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Wed, 21 Jan 2026 22:05:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769032934147/f993dec2-78d7-45b6-a736-0248fa5d398a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There’s a moment that happens to most of us at some point in our careers. You’ve done “everything right”. You’ve split the solution into folders called Modules. You’ve got namespaces that look clean. You might even have separate projects. And yet, six months later, someone adds a reference they shouldn’t, calls into a repository they don’t own, and suddenly your carefully designed architecture is held together by nothing more than goodwill and tribal knowledge.</p>
<p>I’ve been building .NET systems for a long time now, and I’ve learned this the hard way, architecture that relies on discipline alone will eventually fail. Not because people are careless, but because people are busy. They’re under pressure. They’re trying to ship. And sometimes, that one little shortcut feels harmless.</p>
<p>This first post in the series is about how to stop relying on discipline and start enforcing boundaries. Not with dogma. Not with over-engineering. But with practical techniques that work in real .NET codebases.</p>
<h2 id="heading-why-modular-usually-isnt">Why “Modular” Usually Isn’t</h2>
<p>Most so-called modular monoliths fail for one simple reason, nothing actually stops one module reaching into another.</p>
<p>At first, everything feels fine. The codebase is small. You remember where things live. You even review every PR yourself. But time passes. New features land. New developers join. Context fades.</p>
<p>Then you see it:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> user = _userRepository.GetById(userId);
</code></pre>
<p>That line doesn’t live in the Users module.</p>
<p>It lives in Billing.</p>
<p>Now Billing knows about Users’ persistence model. It knows how Users stores data. And worse, it now depends on Users being present, initialised, and shaped exactly the way Billing expects. The boundary is gone.</p>
<p>At that point, you don’t really have modules. You have folders with opinions.</p>
<h2 id="heading-what-a-real-boundary-actually-means">What a Real Boundary Actually Means</h2>
<p>A real boundary is not a folder, a namespace, or a naming convention. It is something you physically cannot cross by accident. If a developer can casually reach across it with a reference, an import, or a “quick” helper method, then the boundary is already broken. Real boundaries create resistance. They force you to slow down and make a conscious decision when you want to interact with another part of the system. The compiler should be your first line of defence. When boundaries are real, the compiler actively helps you stay honest by refusing to compile code that reaches into places it has no business touching. You are not relying on discipline, code reviews, or comments to enforce the architecture. The rules are encoded in the structure of the system itself, and violations show up immediately, not six months later during a refactor.</p>
<p>When a boundary is violated, it should be obvious and slightly painful. You should feel friction when you try to bypass it. If breaking a boundary feels easy, invisible, or harmless, then it was never a real boundary to begin with. At that point, it is purely decorative architecture, something that looks good on a diagram but collapses under day-to-day development pressure.</p>
<p>In practice, this means a module must own its data completely. No other part of the system should be able to reach directly into its tables, collections, or persistence models. The module also controls how others talk to it, exposing only explicit entry points that reflect real use cases rather than internal implementation details.</p>
<p>Most importantly, a module must hide its internals entirely. This is the part people most often get wrong. They allow “just this one” internal type to leak out, or they expose a repository or entity because it feels convenient. Every one of those shortcuts weakens the boundary. Once internals leak, the module stops being a module and starts becoming a shared code bucket with aspirations of structure.</p>
<h2 id="heading-the-shape-of-a-proper-modular-monolith">The Shape of a Proper Modular Monolith</h2>
<p>Before we get into enforcement, it’s worth being clear about the target shape.</p>
<p>Here’s the mental model I use.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769030413409/3ed41125-ae94-4d3b-91e7-7054f8c3e789.png" alt class="image--center mx-auto" /></p>
<p>Each module:</p>
<ul>
<li><p>Exposes a <strong>public surface</strong></p>
</li>
<li><p>Keeps everything else private</p>
</li>
<li><p>Communicates through contracts, not internals</p>
</li>
</ul>
<p>No module reaches directly into another module’s database, repositories, or EF models. If it needs something, it asks.</p>
<h2 id="heading-assemblies-are-your-first-line-of-defence">Assemblies Are Your First Line of Defence</h2>
<p>If you take only one thing from this post, let it be this:</p>
<blockquote>
<p><strong>Folders do not enforce boundaries. Assemblies do.</strong></p>
</blockquote>
<p>The moment you put everything in a single project, you’ve already lost most of your leverage. Yes, you <em>can</em> be disciplined. Yes, you <em>can</em> rely on conventions. But the compiler cannot help you anymore.</p>
<p>A solid baseline is one assembly per module.</p>
<pre><code class="lang-bash">/src
  /Modules
    /Users
      Users.Application
      Users.Domain
      Users.Infrastructure
    /Billing
      Billing.Application
      Billing.Domain
      Billing.Infrastructure
  /Api
</code></pre>
<p>Already, you’ve gained something important, <strong>explicit references</strong>. Billing does not magically see Users unless you tell it to.</p>
<p>But we’re not done yet.</p>
<h2 id="heading-internal-by-default-public-by-exception">Internal by Default, Public by Exception</h2>
<p>One of the most underused features in .NET is the <code>internal</code> keyword. Most people default to <code>public</code> and never look back. That’s a mistake.</p>
<p>Inside a module, almost everything should be internal.</p>
<p>Entities. Repositories. EF configurations. Handlers. Services.</p>
<p>Public types should be rare and deliberate.</p>
<p>For example, in the Users module:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">namespace</span> <span class="hljs-title">Users.Application.Contracts</span>;

<span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title">IUserLookup</span>
{
    <span class="hljs-function">Task&lt;UserSummary&gt; <span class="hljs-title">GetUserAsync</span>(<span class="hljs-params">Guid userId</span>)</span>;
}
</code></pre>
<p>That interface is public. It’s the door into the module.</p>
<p>This, on the other hand, is not:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">internal</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">UserRepository</span> : <span class="hljs-title">IUserRepository</span>
{
    <span class="hljs-comment">// implementation</span>
}
</code></pre>
<p>Billing never sees this. Cannot see this. And that’s exactly the point.</p>
<h2 id="heading-friend-assemblies-a-controlled-escape-hatch">Friend Assemblies: A Controlled Escape Hatch</h2>
<p>“But what about tests?”</p>
<p>This always comes up.</p>
<p>You don’t make everything public just so tests can poke at it. You use <strong>friend assemblies</strong>.</p>
<p>In your module’s AssemblyInfo:</p>
<pre><code class="lang-csharp">[<span class="hljs-meta">assembly: InternalsVisibleTo(<span class="hljs-meta-string">"Users.Tests"</span>)</span>]
</code></pre>
<p>Now your tests can see internals, but no other module can. This keeps the production boundary intact while still allowing deep testing.</p>
<h2 id="heading-forcing-access-through-contracts">Forcing Access Through Contracts</h2>
<p>Forcing access through contracts is what turns an architectural idea into something that actually holds up under pressure. A boundary is only real if there is a single, sanctioned way in. The moment there are multiple paths, helper shortcuts, or back doors, the boundary starts to erode. People will always take the path of least resistance, especially when deadlines are tight. In a modular monolith, that single entry point is almost always a contract. Other modules do not reach in and grab what they want. They ask for something to be done. That shift, from data access to intention, is subtle but crucial. It forces you to think in terms of behaviour and outcomes rather than structures and tables.</p>
<p>Interfaces are one common way of expressing that contract. They define what a module is willing to do, not how it does it. Request and response models push this idea further by making interactions explicit and self-contained. Instead of passing around internal types, you exchange messages that represent a specific use case.</p>
<p>In-process messaging takes the same principle and applies it even more strictly. Rather than direct calls, modules communicate by sending requests or publishing events inside the same process. The module decides how to handle them, and everyone else is deliberately kept at arm’s length. The result is a system where access is intentional, boundaries are enforced by design, and accidental coupling becomes much harder to introduce.</p>
<p>Here’s a simple pattern that works well.</p>
<h3 id="heading-users-exposes-a-contract">Users exposes a contract</h3>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">GetUserQuery</span>(<span class="hljs-params">Guid UserId</span>)</span>;

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> record <span class="hljs-title">UserResult</span>(<span class="hljs-params">Guid Id, <span class="hljs-keyword">string</span> Email</span>)</span>;
</code></pre>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title">IUserQueries</span>
{
    Task&lt;UserResult?&gt; GetAsync(GetUserQuery query);
}
</code></pre>
<h3 id="heading-users-implements-it-internally">Users implements it internally</h3>
<pre><code class="lang-csharp"><span class="hljs-keyword">internal</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">UserQueries</span> : <span class="hljs-title">IUserQueries</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;UserResult?&gt; GetAsync(GetUserQuery query)
    {
        <span class="hljs-comment">// EF Core, Dapper, whatever</span>
    }
}
</code></pre>
<h3 id="heading-billing-depends-only-on-the-contract">Billing depends only on the contract</h3>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">InvoiceService</span>
{
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">readonly</span> IUserQueries _users;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">InvoiceService</span>(<span class="hljs-params">IUserQueries users</span>)</span>
    {
        _users = users;
    }
}
</code></pre>
<p>Billing does not know how Users works. It cannot reach into it even if someone wants to.</p>
<h2 id="heading-blocking-just-this-once-shortcuts">Blocking “Just This Once” Shortcuts</h2>
<p>The most dangerous phrase in software architecture is:</p>
<blockquote>
<p>“It’s just this once.”</p>
</blockquote>
<p>This is where enforcement comes in.</p>
<h3 id="heading-assembly-reference-rules">Assembly Reference Rules</h3>
<p>Assembly reference rules are where modular boundaries stop being theoretical and start being enforceable. If a module exposes a single contracts assembly, that is the only thing other modules are allowed to reference. In this case, Billing should reference Users.Application.Contracts and nothing else. That rule should be obvious, and non-negotiable. The moment someone tries to add a reference to Users.Infrastructure or Users.Domain, something should break. Ideally the build fails. At the very least, alarms should go off loudly enough that the violation cannot slip through unnoticed. If those references are possible and nothing complains, then the boundary exists only by convention, and conventions are fragile under real-world pressure.</p>
<p>You enforce these rules first through solution structure. Projects are laid out so that the intended references are obvious and the forbidden ones feel unnatural. On top of that, you add explicit project reference rules, making it mechanically impossible to depend on the wrong assemblies without deliberately bypassing the design.</p>
<p>Then, you back it all up with architecture tests. These tests don’t care about business logic or behaviour. Their job is simply to assert that the dependency graph stays within the lines you have drawn. When someone crosses a boundary, the test fails, the build goes red, and the conversation happens immediately, not months later when the coupling has already spread.</p>
<h2 id="heading-architecture-tests-that-actually-help">Architecture Tests That Actually Help</h2>
<p>Here’s a simple example using NetArchTest.</p>
<pre><code class="lang-csharp">[<span class="hljs-meta">Test</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">Billing_Should_Not_Depend_On_Users_Infrastructure</span>(<span class="hljs-params"></span>)</span>
{
    <span class="hljs-keyword">var</span> result = Types.InAssembly(<span class="hljs-keyword">typeof</span>(BillingRoot).Assembly)
        .ShouldNot()
        .HaveDependencyOn(<span class="hljs-string">"Users.Infrastructure"</span>)
        .GetResult();

    result.IsSuccessful.Should().BeTrue();
}
</code></pre>
<p>This test doesn’t care about business logic. It cares about <strong>structure</strong>. When it fails, it fails loudly, and early.</p>
<p>That’s the kind of test that saves you months later.</p>
<h2 id="heading-database-boundaries-are-non-negotiable">Database Boundaries Are Non-Negotiable</h2>
<p>If one module can read another module’s tables, you do not have a modular monolith. You have a shared database with delusions of grandeur.</p>
<p>Each module owns its schema.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769031640095/8f6dab8a-c0ef-4576-8ce8-a05033d8ec48.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-the-cost-of-boundaries-and-why-its-worth-paying">The Cost of Boundaries (And Why It’s Worth Paying)</h2>
<p>Let’s be honest. This approach has a cost.</p>
<p>You will:</p>
<ul>
<li><p>Write more interfaces</p>
</li>
<li><p>Pass more data explicitly</p>
</li>
<li><p>Feel friction early on</p>
</li>
</ul>
<p>But here’s what you get in return:</p>
<ul>
<li><p>Predictable change</p>
</li>
<li><p>Safer refactoring</p>
</li>
<li><p>Modules you can actually extract later</p>
</li>
</ul>
<p>I’ve seen systems where adding a feature meant touching 14 projects because everything was intertwined. I’ve also seen systems where a change stayed neatly inside one module. The difference was never intelligence. It was boundaries.</p>
<h2 id="heading-the-human-side-of-this">The Human Side of This</h2>
<p>I’ll end on something less technical.</p>
<p>I do a lot of my thinking late at night. after everyone else is asleep. Sometimes after reading a bedtime story to my daughter and coming back with a cup of tea, knowing I’ve got an hour to make sense of a problem before sleep wins.</p>
<p>In those moments, the last thing I want is a system that fights me. I want code that tells me when I’m about to do something stupid. I want boundaries that protect future-me, not just present-me.</p>
<p>That’s what enforcement gives you.</p>
<hr />
<h2 id="heading-whats-next-in-the-series">What’s Next in the Series</h2>
<p>Now that boundaries are enforced, the next question becomes:</p>
<blockquote>
<p>How do you design <strong>inside</strong> a module?</p>
</blockquote>
<p>That’s where vertical slices come in, and that’s where we’ll go next.</p>
]]></content:encoded></item><item><title><![CDATA[Why 'Unit of Work' Fails in a Distributed System]]></title><description><![CDATA[The Unit of Work (UOW) pattern feels almost invisible when it’s working well. You load a set of entities, make changes, and commit once. If something goes wrong, everything rolls back. For a single application talking to a single database, this model...]]></description><link>https://fullstackcity.com/why-unit-of-work-fails-in-a-distributed-system</link><guid isPermaLink="true">https://fullstackcity.com/why-unit-of-work-fails-in-a-distributed-system</guid><category><![CDATA[design patterns]]></category><category><![CDATA[Architecture Design]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[distributed system]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Mon, 19 Jan 2026 19:28:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768850774219/90087b5d-eb04-4bfb-81d4-d44009891a1c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Unit of Work (UOW) pattern feels almost invisible when it’s working well. You load a set of entities, make changes, and commit once. If something goes wrong, everything rolls back. For a single application talking to a single database, this model is intuitive, reliable, and hard to argue with.</p>
<p>The trouble starts when that same mental model is carried into a distributed system.</p>
<p>As systems grow, you split responsibilities into services, introduce messaging, and isolate data stores. At that point, the guarantees that made UOW safe begin to dissolve. Rather than abandoning the pattern, many people try to stretch it across service boundaries, coordinating commits or wrapping multiple operations in a higher-level “distributed” transaction.</p>
<p>This is where the abstraction breaks down.</p>
<p>Martin Fowler puts it bluntly when discussing microservices and transactions:</p>
<blockquote>
<p><strong>“You can’t maintain ACID transactions across microservices.”</strong></p>
</blockquote>
<p>That single sentence quietly invalidates the idea of a distributed UOW. UOW is an ACID-coordinating pattern. Its entire purpose is to provide atomic commit and rollback across a set of changes. If ACID cannot span services, then neither can UOW. This is not a tooling limitation or a framework gap. It is a consequence of running code across processes, machines, and networks that fail independently. Once a UOW crosses a distributed boundary, the assumptions it relies on no longer hold, even if the code still <em>looks</em> correct.</p>
<p>From that point on, the question is no longer <em>“How do we implement distributed Unit of Work safely?”</em><br />The real question becomes <em>“What patterns replace it when atomicity is no longer possible?”</em></p>
<h2 id="heading-the-illusion-of-distributed-atomicity">The Illusion of Distributed Atomicity</h2>
<p>Distributed UOW usually shows up disguised as coordination. One service acts as a conductor, asking each participant whether it is ready to commit. This is the foundation of two-phase commit, and it looks convincing on paper. In practice, it creates a system that is tightly coupled, slow under load, and fragile under failure. The coordinator can crash after some participants have committed and others have not. A participant can commit successfully but fail before acknowledging. A retry can arrive after a partial commit and cause duplicate work. None of these scenarios are edge cases. They are normal behaviour in a real distributed environment.</p>
<p>The deeper issue is that the network itself is unreliable. Messages can be delayed, reordered, duplicated, or lost. UOW assumes a level of synchrony and trust that the network cannot provide.</p>
<p>When services are forced to coordinate a single commit, they must all be alive and responsive at the same time. A slow or unhealthy service does not just fail its own work, it blocks everyone else. Locks are held longer. Throughput drops. Timeouts increase. Recovery becomes manual. In theory, the system is “more correct”. In reality, it is down more often.</p>
<p>This is why many distributed systems that start with coordinated transactions eventually remove them under production pressure. The cost shows up in incidents, not in code reviews.</p>
<h2 id="heading-the-shift-in-thinking">The Shift in Thinking</h2>
<p>The failure of distributed UOW forces a change in mindset. Instead of asking, “How do I make everything commit together?”, the question becomes, “How do I make each step reliable on its own?” This is the pivot from atomicity to durability. Modern distributed systems accept that work happens in stages. Each stage commits locally. Communication between stages is durable. Failures are expected, retried, and compensated for rather than magically rolled back. Once you accept this, the replacement patterns start to make sense.</p>
<p>The first rule is simple: transactions stop at the service boundary.</p>
<p>Inside a service, you still use UOW. Entity Framework’s <code>DbContext</code> remains a perfectly valid abstraction. You load data, apply changes, and commit once. That part does not change. What changes is the expectation that this transaction somehow covers the rest of the system. It doesnt, and it never will.</p>
<p>Anything that crosses the boundary is treated as asynchronous and unreliable by default.</p>
<h2 id="heading-outbox-as-the-new-commit-boundary">Outbox as the New Commit Boundary</h2>
<p>The transactional outbox pattern exists precisely to bridge the gap between local atomicity and distributed communication.</p>
<p>Instead of updating the database and publishing a message as two separate actions, both are recorded inside the same local transaction. The database change represents what happened. The outbox record represents what needs to be communicated. Once the transaction commits, the system is in a consistent state. Even if the process crashes immediately afterwards, the intent to publish the message is safely stored.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/7Js-4GuNogM">https://youtu.be/7Js-4GuNogM</a></div>
<p> </p>
<p>Later, a background process reads the outbox and delivers messages until they succeed.</p>
<p>This approach does not pretend that messaging is atomic with database writes. It acknowledges that messaging is eventually reliable and builds around that reality.</p>
<p>Once you introduce retries, you introduce duplicates. In a distributed system, messages can be delivered more than once. Any design that assumes otherwise will eventually corrupt data. The replacement for “exactly once” delivery is idempotent handling. Every message is treated as something that may already have been processed. The handler checks, records, and moves on. This makes retries safe. It allows consumers to crash and restart. It allows operators to replay messages from a dead-letter queue without fear. Most importantly, it removes the psychological need for a distributed UOW. You no longer rely on perfect coordination to avoid duplicates, because duplicates are harmless by design.</p>
<h2 id="heading-sagas-replace-distributed-rollback">Sagas Replace Distributed Rollback</h2>
<p>UOW relies on rollback as its safety net. If something fails, everything is undone. In distributed systems, rollback is replaced by compensation. A saga is a sequence of local transactions, each with a defined compensating action. Instead of pretending the work never happened, the system acknowledges that it did happen and applies an explicit reversal. Charging a card can be compensated with a refund. Reserving inventory can be compensated by releasing it. Creating a shipment can be compensated by cancelling it.</p>
<p>This is more work than rollback, but it is also more honest. The system reflects reality rather than hiding it behind abstractions.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/8GORdhReW8E">https://youtu.be/8GORdhReW8E</a></div>
<p> </p>
<p>In simple cases, services can react to events without central coordination. In more complex workflows, orchestration becomes valuable. A process manager or workflow engine tracks which step has completed, which is pending, and which compensations are required. State transitions are persisted. Timeouts are handled deliberately. Retries are visible. This structure replaces the false simplicity of distributed UOW with explicit control. You can see where a process is stuck. You can reason about partial completion. You can resume or compensate without guesswork.</p>
<p>That visibility is impossible when everything is hidden behind a single commit call.</p>
<h2 id="heading-why-the-pattern-is-so-tempting">Why the Pattern Is So Tempting</h2>
<p>Distributed UOW is tempting because it feels familiar. It allows developers to pretend they are still working in a monolith, just stretched across the network. But the complexity does not disappear. It accumulates in places that are harder to debug: locks, timeouts, partial commits, and recovery scripts run at three in the morning!</p>
<p>The patterns that replace it feel more complex at first because they surface reality instead of hiding it. Over time, they reduce surprises, incidents, and data corruption.</p>
<p>None of this means UOW is obsolete.</p>
<p>It remains a great pattern inside a service boundary. It remains the right abstraction for coordinating changes within a single database. The mistake is letting it leak beyond that boundary.</p>
<p>Once the boundary is crossed, the rules change.</p>
<p>The failure of UOW in distributed systems is not a tooling issue or a framework limitation. It is a mismatch between assumptions and reality. Distributed systems do not offer global atomicity. They offer unreliable communication, partial failure, and eventual delivery. Designs that embrace those properties succeed. Designs that fight them collapse under load.</p>
<p>If you stop trying to make a distributed system behave like a local one, the architecture becomes clearer, the failure modes become manageable, and the system becomes something you can actually operate in production.</p>
<p>You also dont get as many call outs at 3:00 am when all the over night processes collapse!</p>
]]></content:encoded></item><item><title><![CDATA[Learning Rust as a C# Engineer]]></title><description><![CDATA[Over the Christmas break I kept seeing the same story surface again and again. Microsoft has started replacing large parts of its C and C++ code with Rust, with a stated long-term goal of removing C and C++ from critical underlying libraries by aroun...]]></description><link>https://fullstackcity.com/learning-rust-as-a-c-engineer</link><guid isPermaLink="true">https://fullstackcity.com/learning-rust-as-a-c-engineer</guid><category><![CDATA[Rust]]></category><category><![CDATA[C#]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Sun, 04 Jan 2026 20:56:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767560101357/4a6e0bce-81d8-42f3-ae96-65933e75ec04.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the Christmas break I kept seeing the same story surface again and again. Microsoft has started replacing large parts of its C and C++ code with Rust, with a stated long-term goal of removing C and C++ from critical underlying libraries by around 2030. That caught my attention. Not because Rust is fashionable, but because Microsoft rarely makes moves like that without a very clear cost-benefit analysis behind them.</p>
<p>As a C# engineer, that immediately raised a practical question. If Rust is becoming the new systems language inside Microsoft, and if more of the runtime, networking stack, and platform tooling will eventually depend on it, then learning Rust now feels like a safe bet rather than a speculative one.</p>
<p>The question is how to approach it.</p>
<p>There is no shortage of Rust tutorials, books, and courses. Many of them are excellent. Most of them also assume you are either new to programming or moving from another low-level language. If you already spend your day working in high-level C#, a lot of that material feels like friction. You end up re-learning concepts you already understand, just with different syntax.</p>
<p>I saw the same advice from experienced developers who had already crossed this bridge. If you already have strong experience and a high level, the fastest way to make Rust click is not to start with toy examples. Its to build something low-level immediately, where Rust’s constraints actually matter.</p>
<p>I decided to build a project.</p>
<h2 id="heading-the-idea">The idea</h2>
<p>The goal was simple on the surface.</p>
<p>Build a small networking system where a sender sends messages, a receiver processes them, and a separate process visualises what is happening internally. Nothing clever. No abstractions for the sake of it.</p>
<p>Under the surface, the goal was more specific.</p>
<p>I wanted to force myself to deal with explicit framing and byte handling, async IO in a real socket loop, and a clearer separation between transport, parsing, and application logic. Ownership and borrowing showed up too, mainly around buffers and task boundaries, though I haven’t pushed into the deeper lifetime-heavy patterns yet.</p>
<p>At the same time, I wanted to keep one foot in familiar territory. I didnt want to abandon C# entirely while learning Rust. So I split the system across two machines.</p>
<p>I had access to a Mac and a PC on the same network so I used that to my advantage.</p>
<p>The Rust receiver runs on the Mac.<br />The C# sender and telemetry visualiser run on the PC.</p>
<p>That split turned out to be more useful than I expected.</p>
<h2 id="heading-high-level-architecture">High-level architecture</h2>
<p>The system consists of three very small programs.</p>
<p>A C# sender<br />A Rust receiver<br />A C# telemetry visualiser</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767558553908/412d63a0-9ce4-4215-b30f-d7976393efa0.png" alt class="image--center mx-auto" /></p>
<p>The sender connects directly to the receiver over TCP and sends framed JSON messages. Each message has a length prefix so the receiver can read full frames cleanly.</p>
<p>The receiver reads the raw bytes, parses the JSON, and emits telemetry events as the message conceptually moves through layers. These layers are not pretending to be the real OS network stack. They are a teaching tool. They make invisible steps visible.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767558966157/e4f3df66-856e-4b04-8c55-9904bc3ad0e9.png" alt class="image--center mx-auto" /></p>
<p>The receiver sends telemetry over UDP to the visualiser. UDP keeps the telemetry path simple and non-blocking.</p>
<p>The visualiser listens on a UDP port and renders a live terminal UI showing messages moving through application, transport, network, and link lanes.</p>
<p>There is no router process. No segmentation. No configurable packet sizes. I deliberately removed all of that. The goal here is learning Rust, not building a networking framework.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767559643986/37a9c8b1-baf1-423f-920e-f4f6488adcec.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-why-rust-on-the-receiver-side">Why Rust on the receiver side</h2>
<p>Putting the receiver in Rust was a deliberate choice.</p>
<p>In C#, reading from a network stream is trivial. You can build something that works in minutes, but it hides a lot of detail. Buffers are managed for you. Memory ownership is implicit. Errors often surface late.</p>
<p>In Rust, you cannot avoid those details.</p>
<p>Reading a length prefix forces you to think about endianness.<br />Reading into a buffer forces you to manage capacity and resizing.<br />Passing data between functions forces you to think about ownership.</p>
<p>None of this feels academic when you are dealing with a real socket.</p>
<pre><code class="lang-rust"><span class="hljs-keyword">use</span> anyhow::Context;
<span class="hljs-keyword">use</span> bytes::BytesMut;
<span class="hljs-keyword">use</span> chrono::Utc;
<span class="hljs-keyword">use</span> serde::{Deserialize, Serialize};
<span class="hljs-keyword">use</span> std::{env, net::SocketAddr, sync::Arc};
<span class="hljs-keyword">use</span> tokio::{
    io::AsyncReadExt,
    net::{TcpListener, TcpStream, UdpSocket},
    time::{sleep, Duration},
};

<span class="hljs-meta">#[derive(Debug, Deserialize)]</span>
<span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">Packet</span></span> {
    packet_id: <span class="hljs-built_in">u64</span>,
    created_at_ms: <span class="hljs-built_in">u64</span>,
    payload: <span class="hljs-built_in">String</span>,
}

<span class="hljs-meta">#[derive(Debug, Serialize)]</span>
<span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">TelemetryEvent</span></span>&lt;<span class="hljs-symbol">'a</span>&gt; {
    ts_ms: <span class="hljs-built_in">i64</span>,
    source: &amp;<span class="hljs-symbol">'a</span> <span class="hljs-built_in">str</span>, <span class="hljs-comment">// "receiver"</span>
    packet_id: <span class="hljs-built_in">u64</span>,
    layer: &amp;<span class="hljs-symbol">'a</span> <span class="hljs-built_in">str</span>, <span class="hljs-comment">// "link" | "network" | "transport" | "application"</span>
    kind: &amp;<span class="hljs-symbol">'a</span> <span class="hljs-built_in">str</span>,  <span class="hljs-comment">// "enter" | "deliver" | "error"</span>
    note: &amp;<span class="hljs-symbol">'a</span> <span class="hljs-built_in">str</span>,
    size: <span class="hljs-built_in">usize</span>,
}

<span class="hljs-meta">#[tokio::main]</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">main</span></span>() -&gt; anyhow::<span class="hljs-built_in">Result</span>&lt;()&gt; {
    <span class="hljs-keyword">let</span> data_addr = env::var(<span class="hljs-string">"DATA_BIND"</span>).unwrap_or_else(|_| <span class="hljs-string">"0.0.0.0:9000"</span>.to_string());
    <span class="hljs-keyword">let</span> telemetry_target =
        env::var(<span class="hljs-string">"TELEMETRY_TARGET"</span>).unwrap_or_else(|_| <span class="hljs-string">"127.0.0.1:9100"</span>.to_string());

    <span class="hljs-built_in">println!</span>(<span class="hljs-string">"Receiver listening on {}"</span>, data_addr);
    <span class="hljs-built_in">println!</span>(<span class="hljs-string">"Telemetry target: {}"</span>, telemetry_target);

    <span class="hljs-keyword">let</span> listener = TcpListener::bind(&amp;data_addr).<span class="hljs-keyword">await</span>?;

    <span class="hljs-keyword">let</span> udp = Arc::new(UdpSocket::bind(<span class="hljs-string">"0.0.0.0:0"</span>).<span class="hljs-keyword">await</span>?);
    <span class="hljs-keyword">let</span> telemetry_addr: SocketAddr = telemetry_target
        .parse()
        .context(<span class="hljs-string">"TELEMETRY_TARGET must be like 192.168.0.10:9100"</span>)?;

    <span class="hljs-keyword">loop</span> {
        <span class="hljs-keyword">let</span> (socket, peer) = listener.accept().<span class="hljs-keyword">await</span>?;
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"Accepted connection from {}"</span>, peer);

        <span class="hljs-keyword">let</span> udp = udp.clone();
        tokio::spawn(<span class="hljs-keyword">async</span> <span class="hljs-keyword">move</span> {
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">let</span> <span class="hljs-literal">Err</span>(e) = handle_connection(socket, peer, udp, telemetry_addr).<span class="hljs-keyword">await</span> {
                eprintln!(<span class="hljs-string">"Connection error ({}): {:?}"</span>, peer, e);
            }
        });
    }
}

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">handle_connection</span></span>(
    <span class="hljs-keyword">mut</span> socket: TcpStream,
    peer: SocketAddr,
    udp: Arc&lt;UdpSocket&gt;,
    telemetry_addr: SocketAddr,
) -&gt; anyhow::<span class="hljs-built_in">Result</span>&lt;()&gt; {
    <span class="hljs-keyword">loop</span> {
        <span class="hljs-comment">// Read u32 big-endian length prefix</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> len_buf = [<span class="hljs-number">0u8</span>; <span class="hljs-number">4</span>];
        <span class="hljs-keyword">if</span> socket.read_exact(&amp;<span class="hljs-keyword">mut</span> len_buf).<span class="hljs-keyword">await</span>.is_err() {
            <span class="hljs-built_in">println!</span>(<span class="hljs-string">"Connection closed by {}"</span>, peer);
            <span class="hljs-keyword">return</span> <span class="hljs-literal">Ok</span>(());
        }

        <span class="hljs-keyword">let</span> frame_len = <span class="hljs-built_in">u32</span>::from_be_bytes(len_buf) <span class="hljs-keyword">as</span> <span class="hljs-built_in">usize</span>;

        <span class="hljs-comment">// Read frame bytes</span>
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> buf = BytesMut::with_capacity(frame_len);
        buf.resize(frame_len, <span class="hljs-number">0</span>);
        socket.read_exact(&amp;<span class="hljs-keyword">mut</span> buf).<span class="hljs-keyword">await</span>?;

        <span class="hljs-comment">// Parse JSON</span>
        <span class="hljs-keyword">let</span> packet: Packet = <span class="hljs-keyword">match</span> serde_json::from_slice(&amp;buf) {
            <span class="hljs-literal">Ok</span>(p) =&gt; p,
            <span class="hljs-literal">Err</span>(_) =&gt; {
                <span class="hljs-keyword">let</span> _ = send_event(
                    &amp;udp,
                    telemetry_addr,
                    TelemetryEvent {
                        ts_ms: Utc::now().timestamp_millis(),
                        source: <span class="hljs-string">"receiver"</span>,
                        packet_id: <span class="hljs-number">0</span>,
                        layer: <span class="hljs-string">"transport"</span>,
                        kind: <span class="hljs-string">"error"</span>,
                        note: <span class="hljs-string">"json_parse_failed"</span>,
                        size: frame_len,
                    },
                )
                    .<span class="hljs-keyword">await</span>;
                <span class="hljs-keyword">continue</span>;
            }
        };

        <span class="hljs-comment">// Emit layer events (simple, deterministic)</span>
        send_event(
            &amp;udp,
            telemetry_addr,
            TelemetryEvent {
                ts_ms: Utc::now().timestamp_millis(),
                source: <span class="hljs-string">"receiver"</span>,
                packet_id: packet.packet_id,
                layer: <span class="hljs-string">"link"</span>,
                kind: <span class="hljs-string">"enter"</span>,
                note: <span class="hljs-string">"frame_received"</span>,
                size: frame_len,
            },
        )
            .<span class="hljs-keyword">await</span>?;
        sleep(Duration::from_millis(layer_delay_ms())).<span class="hljs-keyword">await</span>;

        send_event(
            &amp;udp,
            telemetry_addr,
            TelemetryEvent {
                ts_ms: Utc::now().timestamp_millis(),
                source: <span class="hljs-string">"receiver"</span>,
                packet_id: packet.packet_id,
                layer: <span class="hljs-string">"network"</span>,
                kind: <span class="hljs-string">"enter"</span>,
                note: <span class="hljs-string">"packet_parsed"</span>,
                size: frame_len,
            },
        )
            .<span class="hljs-keyword">await</span>?;
        sleep(Duration::from_millis(layer_delay_ms())).<span class="hljs-keyword">await</span>;

        send_event(
            &amp;udp,
            telemetry_addr,
            TelemetryEvent {
                ts_ms: Utc::now().timestamp_millis(),
                source: <span class="hljs-string">"receiver"</span>,
                packet_id: packet.packet_id,
                layer: <span class="hljs-string">"transport"</span>,
                kind: <span class="hljs-string">"deliver"</span>,
                note: <span class="hljs-string">"delivered_to_app"</span>,
                size: frame_len,
            },
        )
            .<span class="hljs-keyword">await</span>?;
        sleep(Duration::from_millis(layer_delay_ms())).<span class="hljs-keyword">await</span>;

        send_event(
            &amp;udp,
            telemetry_addr,
            TelemetryEvent {
                ts_ms: Utc::now().timestamp_millis(),
                source: <span class="hljs-string">"receiver"</span>,
                packet_id: packet.packet_id,
                layer: <span class="hljs-string">"application"</span>,
                kind: <span class="hljs-string">"deliver"</span>,
                note: <span class="hljs-string">"payload_ready"</span>,
                size: frame_len,
            },
        )
            .<span class="hljs-keyword">await</span>?;
        sleep(Duration::from_millis(layer_delay_ms())).<span class="hljs-keyword">await</span>;

        <span class="hljs-comment">// Console breakdown</span>
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"---"</span>);
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"From:            {}"</span>, peer);
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"Frame length:    {} bytes"</span>, frame_len);
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"packet_id:       {}"</span>, packet.packet_id);
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"created_at_ms:   {}"</span>, packet.created_at_ms);
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"payload:         {}"</span>, packet.payload);

        <span class="hljs-comment">// Raw preview (first 96 bytes)</span>
        <span class="hljs-keyword">let</span> preview_len = buf.len().min(<span class="hljs-number">96</span>);
        <span class="hljs-keyword">let</span> preview = &amp;buf[..preview_len];
        <span class="hljs-built_in">println!</span>(<span class="hljs-string">"raw preview ({}): {:02X?}"</span>, preview_len, preview);
    }
}

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">send_event</span></span>(udp: &amp;UdpSocket, target: SocketAddr, ev: TelemetryEvent&lt;<span class="hljs-symbol">'_</span>&gt;) -&gt; anyhow::<span class="hljs-built_in">Result</span>&lt;()&gt; {
    <span class="hljs-keyword">let</span> bytes = serde_json::to_vec(&amp;ev)?;
    udp.send_to(&amp;bytes, target).<span class="hljs-keyword">await</span>?;
    <span class="hljs-literal">Ok</span>(())
}

<span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">layer_delay_ms</span></span>() -&gt; <span class="hljs-built_in">u64</span> {
    env::var(<span class="hljs-string">"LAYER_DELAY_MS"</span>)
        .ok()
        .and_then(|v| v.parse().ok())
        .unwrap_or(<span class="hljs-number">120</span>)
}
</code></pre>
<p>The receiver reads four bytes for the frame length, then reads exactly that many bytes into a buffer. It then attempts to deserialise JSON from that buffer. If deserialisation fails, it emits an error event and moves on.</p>
<p>Once the packet is parsed, the receiver emits telemetry events as it conceptually moves through layers. Each event includes a timestamp, packet id, layer name, event kind, and size.</p>
<p>Finally, the receiver prints a detailed breakdown to the console. Peer address. Frame size. Packet id. Timestamp. Payload contents. A raw hex preview of the first bytes. Nothing hidden.</p>
<p>This is where Rust started to click.</p>
<p>The compiler forces you to be explicit. Once the code builds, you know the data flow is correct. There is very little ambiguity.</p>
<h2 id="heading-why-keep-the-sender-and-visualiser-in-c">Why keep the sender and visualiser in C</h2>
<p>I kept the sender in C# for two reasons.</p>
<p>First, speed. I didnt want to spend time building input handling and framing logic from scratch while still learning Rust. In C#, that part is muscle memory.</p>
<p>Second, comparison. Writing the sender in C# makes the differences obvious. You can see how much the runtime gives you for free, and what Rust expects you to manage yourself.</p>
<p>The telemetry visualiser also stays in C#. <code>Spectre.Console</code> makes it easy to build a clean terminal UI quickly. More importantly, it keeps the visualisation code separate from the Rust learning exercise.</p>
<p>Rust does one job. The visualiser does another.</p>
<p>That separation matters.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767561616025/7398a027-f312-45e5-bd92-d16ee8ba517f.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-running-across-two-machines">Running across two machines</h2>
<p>Running the Rust receiver on a Mac and the C# processes on a PC adds one more layer of realism.</p>
<p>This is not a loopback demo. Real sockets are involved. Real IP addresses. Real failure modes. You immediately run into practical issues. Address binding. Firewalls. Ports already in use. Processes exiting unexpectedly. All of that is part of systems work, and Rust does not shield you from it. It also reinforces a useful mental model. Rust is not replacing C# for application development. It is complementing it. The two languages sit at different levels of the stack, and they interact over simple, explicit protocols. That feels very close to how modern platforms are actually built.</p>
<h2 id="heading-what-this-project-taught-me">What this project taught me</h2>
<p>The biggest takeaway is that Rust makes more sense when you stop trying to learn it in isolation.</p>
<p>If you come from C#, you already understand concurrency, async workflows, and data modelling. Rust is not trying to re-teach those concepts. It is forcing you to be explicit about things the runtime normally hides. Ownership is not abstract when you are passing buffers around. Lifetimes are not theoretical when a socket read depends on them. Error handling feels stricter, but also more honest. Building a real, low-level project immediately surfaces the value Rust brings. You stop fighting the language and start seeing why it exists.</p>
<h2 id="heading-where-next">Where next?</h2>
<p>Im deliberately stopping this project here. It works. Its understandable. It does exactly what I need it to do. If I extend it later, it will be with a clear purpose. Maybe adding a router process. Maybe experimenting with reordering or loss. Maybe rebuilding the receiver in a different async model.</p>
<p>For now, this was enough. I got what I wanted from it.</p>
<p>It sparked my interest enough to think of a different more involved bigger project to take on next time.</p>
<p>If you are a C# engineer thinking about learning Rust, my advice is simple. Do not start with the beginner examples. Pick a problem where memory, and data flow matter. Build something small but real, get your hands dirty &amp; have fun.</p>
]]></content:encoded></item><item><title><![CDATA[Building a Camunda-Like Workflow Tracker Without BPMN in .NET]]></title><description><![CDATA[Camunda is a strong, mature platform, and solves hard problems very well. But its also designed to cover a huge surface area, visual modelling, execution, orchestration, human tasks, retries, compensation, and governance. If you have a full developme...]]></description><link>https://fullstackcity.com/building-a-camunda-like-workflow-tracker-without-bpmn-in-net</link><guid isPermaLink="true">https://fullstackcity.com/building-a-camunda-like-workflow-tracker-without-bpmn-in-net</guid><category><![CDATA[.NET]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[camunda]]></category><category><![CDATA[workflow-orchestration]]></category><category><![CDATA[Orchestration]]></category><dc:creator><![CDATA[Patrick Kearns]]></dc:creator><pubDate>Sat, 27 Dec 2025 13:07:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766840269608/a41cd088-5c7c-4e73-ab9d-e155711a68b1.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Camunda is a strong, mature platform, and solves hard problems very well. But its also designed to cover a huge surface area, visual modelling, execution, orchestration, human tasks, retries, compensation, and governance. If you have a full development team that already owns the business logic, the data model, and the deployment pipeline, you might not need all that capability. What you might just need is visibility, auditability, and the ability to answer questions about how work actually flows through the system. In those cases, a lightweight workflow tracker gives you most of the value at a fraction of the complexity, and lets engineers stay in code rather than in diagrams.</p>
<p>Most workflow engines start from the wrong place.</p>
<p>They begin with diagrams, XML, modelling tools, and an assumption that your business logic wants to be expressed as a flowchart. In practice, most production systems already have workflows. They just live in code, databases, queues, APIs, and human decisions. The missing piece isnt orchestration logic. Its visibility.</p>
<p>All you might want is a reliable way to answer simple questions. What step is this case on? How long did it spend there? Who touched it? What changed? Why did it fall from the happy path? Those questions are about tracking, not modelling.</p>
<p>Below, we’ll think about how we could build a stripped-down workflow tracker in .NET. No BPMN. No visual modeller. No engine deciding what happens next. The application decides that. Our system records what happened, when it happened, and why. Then it makes that data easy to query.</p>
<p>The result looks a lot like the most valuable parts of Camunda, but without the weight.</p>
<h3 id="heading-the-core-idea">The core idea</h3>
<p>A workflow is not a graph. It is a sequence of events over time.</p>
<p>Every time a piece of work moves forward, something observable happens. A step starts. A step completes. A decision is made. Data changes. Someone intervenes. If we capture those events in a consistent way, we can reconstruct the workflow after the fact with surprising power.</p>
<p>The tracker does not control execution. It listens.</p>
<p>This single constraint simplifies everything. There is no retry logic here. No compensation. No tokens moving through gateways. Your application owns those concerns already. The tracker’s only responsibility is to record transitions and state changes with strong guarantees.</p>
<h3 id="heading-defining-a-workflow-instance">Defining a workflow instance</h3>
<p>We start by defining what a workflow instance means in our system.</p>
<p>An instance represents one unit of work moving through a process. That might be a loan application, an insurance submission, a customer onboarding case, or a background job. The tracker does not care.</p>
<p>An instance has a stable identifier, a workflow type, and some high-level metadata. It is created once and never deleted.</p>
<p>In C#, that looks like a simple aggregate.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">WorkflowInstance</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> Id { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> WorkflowType { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> ExternalReference { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> DateTimeOffset CreatedAt { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
}
</code></pre>
<p>The <code>ExternalReference</code> is critical. This is how the rest of your system relates back to the workflow. It might be a ProgramId, SubmissionId, OrderId, or something similar.</p>
<h3 id="heading-modelling-steps-as-facts-not-definitions">Modelling steps as facts, not definitions</h3>
<p>Traditional engines define steps up front. We dont.</p>
<p>Instead, every time the application reaches a meaningful point, it emits a step event. A step is identified by a string key that has meaning to the domain.</p>
<p>Examples might be <code>submission_received</code>, <code>assessed</code>, or <code>review_started</code>.</p>
<p>The tracker does not validate these names. It records them.</p>
<p>A step event captures when a step started, when it completed, who or what performed it, and whether it ended successfully.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">WorkflowStepEvent</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> Id { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> WorkflowInstanceId { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> StepKey { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> StepStatus Status { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> DateTimeOffset Timestamp { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span>? Actor { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
}
</code></pre>
<p>The status might be Started, Completed, Failed, or Cancelled. That is enough to reconstruct timelines and durations later.</p>
<p>This model deliberately allows repeated steps. If a case goes back to manual review three times, you will see three distinct events. That turns out to be extremely valuable for analytics.</p>
<h3 id="heading-capturing-decisions-and-data-changes">Capturing decisions and data changes</h3>
<p>Steps alone are not enough. Many workflows branch based on data. We want to record those decisions without baking logic into the tracker.</p>
<p>Whenever the application makes a decision that affects flow, it emits a decision event.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">WorkflowDecisionEvent</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> Id { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> WorkflowInstanceId { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> DecisionKey { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> Outcome { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> DateTimeOffset Timestamp { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
}
</code></pre>
<p>This might represent something like <code>risk_outcome = refer</code> or <code>eligibility = declined</code>. The tracker does not care how the decision was made. It just records the fact that it happened.</p>
<p>Optionally, we can also capture variable snapshots. These are key-value pairs recorded at points of interest.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">WorkflowVariableSnapshot</span>
{
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> WorkflowInstanceId { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> Name { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> <span class="hljs-keyword">string</span> Value { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> DateTimeOffset RecordedAt { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
}
</code></pre>
<p>This is not a full variable store. It is an audit trail. You record what mattered, when it mattered.</p>
<h3 id="heading-writing-events-safely">Writing events safely</h3>
<p>The most important technical requirement is durability. If your application says a step completed, the tracker must not lose that fact.</p>
<p>The simplest approach is an append-only relational schema. Inserts only. No updates except for correcting mistakes explicitly.</p>
<p>Every call into the tracker is a single database transaction that inserts one or more events. There is no orchestration state to lock or mutate. In practice, this scales extremely well.</p>
<p>You can expose the tracker via a small internal API.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title">IWorkflowTracker</span>
{
    <span class="hljs-function">Task <span class="hljs-title">RecordStepAsync</span>(<span class="hljs-params">
        <span class="hljs-keyword">long</span> instanceId,
        <span class="hljs-keyword">string</span> stepKey,
        StepStatus status,
        <span class="hljs-keyword">string</span>? actor,
        CancellationToken stopToken</span>)</span>;

    <span class="hljs-function">Task <span class="hljs-title">RecordDecisionAsync</span>(<span class="hljs-params">
        <span class="hljs-keyword">long</span> instanceId,
        <span class="hljs-keyword">string</span> decisionKey,
        <span class="hljs-keyword">string</span> outcome,
        CancellationToken stopToken</span>)</span>;
}
</code></pre>
<p>The application calls this at natural boundaries. After a handler completes. When a background job finishes. When a user clicks approve.</p>
<p>This keeps the integration friction low.</p>
<h3 id="heading-reconstructing-the-workflow-timeline">Reconstructing the workflow timeline</h3>
<p>Once events are stored, the interesting part begins.</p>
<p>To reconstruct the current state of a workflow, you query all events for an instance ordered by timestamp. From that stream, you derive projections.</p>
<p>You can compute the current step by finding the latest Started event without a corresponding Completed or Failed event. You can calculate durations by pairing Started and Completed timestamps. You can detect loops by counting repeated step keys.</p>
<p>This logic lives in read models, not the write path.</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">public</span> <span class="hljs-keyword">sealed</span> <span class="hljs-keyword">class</span> <span class="hljs-title">WorkflowTimeline</span>
{
    <span class="hljs-keyword">public</span> IReadOnlyList&lt;StepExecution&gt; Steps { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> IReadOnlyList&lt;DecisionRecord&gt; Decisions { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
    <span class="hljs-keyword">public</span> TimeSpan TotalElapsed { <span class="hljs-keyword">get</span>; <span class="hljs-keyword">init</span>; }
}
</code></pre>
<p>Because everything is immutable, rebuilding projections is deterministic and safe. You can even change projection logic later and re-run it over historical data.</p>
<h3 id="heading-analytics">Analytics</h3>
<p>This is where the tracker earns its keep, you can now answer questions like:</p>
<p>How long does each step take on average?<br />Where do cases get stuck?<br />How often do we bounce back to manual review?<br />Which decisions correlate with long cycle times?<br />How many workflows are active right now and where are they sitting?</p>
<p>These are simple SQL queries over event tables. No engine internals. No proprietary formats.</p>
<p>Because step keys are just strings, teams can evolve workflows without migrations. New steps appear naturally in analytics as they are used.</p>
<h3 id="heading-heat-mapper">Heat Mapper</h3>
<p>One of the main selling points of Camunda is the the view that highlights where instances spend time (hot spots) and how often paths are taken.</p>
<p>You could build a very similar UI even without BPMN. You just need to choose a <em>visual shape</em> for your workflow, then paint it with metrics.</p>
<p>The simplest approach is to render your workflow as a directed graph where each <strong>step key</strong> is a node and each observed <strong>transition</strong> between steps is an edge. You don’t need a modeller. You infer the graph from real executions: for each workflow instance, sort step events by time and emit transitions like <code>stepA -&gt; stepB</code>. Aggregate those transitions across all instances and you get a live map of how the process actually runs.</p>
<p>Once you have nodes and edges, you compute heat metrics:</p>
<ul>
<li><p><strong>Node heat (time)</strong>: average or p95 time spent in that step. You get this by pairing Started and Completed timestamps per step execution.</p>
</li>
<li><p><strong>Node heat (volume)</strong>: how many instances entered the step in a time window.</p>
</li>
<li><p><strong>Edge heat (frequency)</strong>: how often a transition happens, optionally as a percentage of all outgoing transitions from the source step.</p>
</li>
<li><p><strong>Stuck heat</strong>: count of instances currently “in” a step (Started with no Completed/Failed yet), and how long they’ve been there.</p>
</li>
</ul>
<p>From a UI point of view, you can build it in React with a graph library like React Flow, Cytoscape.js, or D3. You lay out nodes (either auto-layout or a simple left-to-right rank layout), draw edges, and then colour nodes/edges based on the selected metric. Add a time window filter (last 24h, 7d, 30d), environment filter, and a toggle between average and p95, and you’ve got something that feels very close to Camunda’s operational heatmaps.</p>
<p>A practical structure that works well is:</p>
<ul>
<li><p><strong>Overview page</strong>: “Workflow map” with heat applied (time or frequency).</p>
</li>
<li><p><strong>Step drilldown</strong>: click a node to see distributions (p50/p95), failure rate, top decision outcomes, and a list of slowest instances.</p>
</li>
<li><p><strong>Instance timeline</strong>: click an instance to see the ordered event stream with durations, actors, and variable snapshots at key points.</p>
</li>
</ul>
<p>The key limitation is that you won’t automatically get a nice “business diagram” unless you define one. But that’s often a feature, not a bug. Your map becomes an honest picture of runtime behaviour. If you want it to look more like a designed process, you can optionally let teams maintain a lightweight “layout config” (node positions, grouping into lanes, friendly labels) without turning it into BPMN.</p>
<h3 id="heading-comparing-this-to-a-full-workflow-engine">Comparing this to a full workflow engine</h3>
<p>This approach deliberately gives up control in exchange for clarity.</p>
<p>You cannot model flows visually here. You cannot press a button to advance a token. That is intentional. Those features are expensive to operate and rarely reflect how systems actually behave. What you gain is observability, auditability, and freedom. Your application logic stays in code where it belongs. The tracker becomes a shared language across teams. In many places, this covers eighty percent of the value people think they need BPMN for.</p>
<h3 id="heading-when-this-approach-is-not-enough">When this approach is not enough</h3>
<p>There are real cases where orchestration engines make sense. Long-running sagas with retries and compensation. Complex asynchronous dependencies across systems. Human task scheduling with SLAs and escalations.</p>
<p>The key insight is that you do not need to start there.</p>
<p>A lightweight tracker often becomes the foundation even when a full engine is later introduced. It provides ground truth. It tells you what your workflows actually look like, not what the diagram says they should look like.</p>
<p>If you strip workflow down to its essence, it is just time, decisions, and movement. By recording those facts cleanly, you unlock powerful insight without forcing your system into a modelling straightjacket.This kind of tracker is fast to build, cheap to run, and easy to explain.</p>
]]></content:encoded></item></channel></rss>