Blazor Server SignalR: Scale to 10,000 Users

Can a Blazor Server app really keep 10,000 people online at the same time without melting your servers? Yes – if you treat SignalR like a high‑throughput system, not a chat toy.

Over the last few years I’ve shipped Blazor Server apps that handle spikes during live events: thousands of dashboards open, heavy broadcasts, and a sea of reconnects when Wi‑Fi hiccups. In this guide I’ll show you the concrete steps that kept those systems fast and stable. No magic – just good limits, lean payloads, and the right Azure setup.

What you’re building (and why it’s hard)

Blazor Server rides on a SignalR connection. Each browser holds a long‑lived connection (WebSocket when possible). At 10,000 concurrent users your app is mostly about:

Connections: tracking, reconnecting, and keeping them alive.
Messages: small, frequent renders from the server to the browser.
CPU & memory: JSON/MessagePack serialization and diffing of render batches.
Scale‑out: more instances and/or Azure SignalR Service.

If any of these is wasteful, you bleed CPU and RAM per connection and hit limits fast.

The scaling path at a glance

Scale up a single node: Kestrel, server GC, WebSockets on, MessagePack on, strict limits.
Scale out app instances: shared Data Protection keys, health probes, ARR affinity if you don’t use Azure SignalR.
Offload fan‑out to Azure SignalR Service (Default mode for Blazor Server): better connection density, smoother bursts, simpler routing.
Automate: autoscale rules + visibility (counters, logs, end‑to‑end traces).

Below I’ll walk through each step with code and config that you can copy into a real project today.

Project baseline

Create a plain Blazor Server app on .NET 8 (or newer).

Key packages

<ItemGroup>
  <PackageReference Include="Microsoft.AspNetCore.SignalR.Protocols.MessagePack" Version="8.*" />
</ItemGroup>

Program.cs – minimal but fast

var builder = WebApplication.CreateBuilder(args);

// Blazor + SignalR with strict hub limits
builder.Services
    .AddServerSideBlazor(options =>
    {
        // keep render pipeline under control
        options.MaxBufferedUnacknowledgedRenderBatches = 5;           // backpressure to slow clients
        options.DisconnectedCircuitRetentionPeriod = TimeSpan.FromMinutes(3);
        options.JSInteropDefaultCallTimeout = TimeSpan.FromSeconds(10);
    })
    .AddHubOptions(o =>
    {
        o.MaximumReceiveMessageSize = 64 * 1024;                      // 64 KB per incoming message
        o.EnableDetailedErrors = false;                               // never in prod
        o.ClientTimeoutInterval = TimeSpan.FromSeconds(30);           // drop dead connections faster
        o.KeepAliveInterval = TimeSpan.FromSeconds(15);               // keep LB happy (WebSockets)
        o.HandshakeTimeout = TimeSpan.FromSeconds(15);
        o.StreamBufferCapacity = 8;                                   // per stream buffer size
    });

// SignalR protocol: prefer MessagePack for smaller payloads
builder.Services.AddSignalR().AddMessagePackProtocol();

// Presence and fan-out pipeline
builder.Services.AddSingleton<PresenceStore>();
builder.Services.AddSingleton<BroadcastQueue>();

var app = builder.Build();

app.MapBlazorHub();
app.MapFallbackToPage("/_Host");

app.Run();

Why these numbers? They’re safe defaults to stop noisy clients from pushing the server over the edge. Tune them with your traffic profile, but keep the mindset: deny by default, allow by measurement.

Cut payload size first (MessagePack + lean models)

Serialization burns CPU and memory. Two simple wins:

Use MessagePack for SignalR. It is binary and compact.
Send lean DTOs to the client, not EF models or full view models.

DTO example

public sealed record StockTickDto(string Symbol, decimal Price, long EpochMs);

Sending

public class TickerHub : Hub
{
    public Task Subscribe(string symbol) => Groups.AddToGroupAsync(Context.ConnectionId, symbol);
}

// Elsewhere: broadcast to a group
public sealed class TickerFanout
{
    private readonly IHubContext<TickerHub> _hub;
    public TickerFanout(IHubContext<TickerHub> hub) => _hub = hub;

    public Task PublishAsync(string symbol, StockTickDto dto, CancellationToken ct)
        => _hub.Clients.Group(symbol).SendAsync("tick", dto, ct);
}

Tip: don’t send strings with extra whitespace or long property names. Every byte counts at 10,000 users.

Don’t do heavy work inside hubs

A hub method runs on the request path of a live connection. Block here and you stall the socket. Use a bounded channel to offload work to background workers.

Bounded channel + backpressure

public sealed class BroadcastQueue
{
    private readonly Channel<(string Group, object Payload)> _channel =
        Channel.CreateBounded<(string, object)>(new BoundedChannelOptions(10_000)
        {
            FullMode = BoundedChannelFullMode.DropOldest, // protect server under burst
            SingleReader = true,
            SingleWriter = false
        });

    public bool TryEnqueue(string group, object payload)
        => _channel.Writer.TryWrite((group, payload));

    public IAsyncEnumerable<(string Group, object Payload)> ReadAllAsync(CancellationToken ct)
        => _channel.Reader.ReadAllAsync(ct);
}

public sealed class BroadcastWorker : BackgroundService
{
    private readonly BroadcastQueue _queue;
    private readonly IHubContext<TickerHub> _hub;
    public BroadcastWorker(BroadcastQueue queue, IHubContext<TickerHub> hub)
        => (_queue, _hub) = (queue, hub);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await foreach (var (group, payload) in _queue.ReadAllAsync(stoppingToken))
        {
            try
            {
                await _hub.Clients.Group(group).SendAsync("tick", payload, stoppingToken);
            }
            catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested) { }
            catch (Exception ex)
            {
                // log and continue; never block the loop
            }
        }
    }
}

Now hub methods just validate input and enqueue – constant time under load.

Presence tracking without locks

You’ll need to know who is online, and which groups they’re in, fast.

public sealed class PresenceStore
{
    private readonly ConcurrentDictionary<string, HashSet<string>> _groups = new();

    public void Join(string connectionId, string group)
    {
        var set = _groups.GetOrAdd(group, _ => new HashSet<string>(StringComparer.Ordinal));
        lock (set)
        {
            set.Add(connectionId);
        }
    }

    public void Leave(string connectionId, string group)
    {
        if (_groups.TryGetValue(group, out var set))
        {
            lock (set)
            {
                set.Remove(connectionId);
                if (set.Count == 0)
                    _groups.TryRemove(group, out _);
            }
        }
    }

    public int Count(string group)
        => _groups.TryGetValue(group, out var set) ? set.Count : 0;
}

public sealed class PresenceHub : Hub
{
    private readonly PresenceStore _presence;
    public PresenceHub(PresenceStore presence) => _presence = presence;

    public override Task OnConnectedAsync()
    {
        // join a personal group for targeted pushes
        _presence.Join(Context.ConnectionId, Context.UserIdentifier ?? Context.ConnectionId);
        return base.OnConnectedAsync();
    }

    public override Task OnDisconnectedAsync(Exception? ex)
    {
        _presence.Leave(Context.ConnectionId, Context.UserIdentifier ?? Context.ConnectionId);
        return base.OnDisconnectedAsync(ex);
    }
}

Rationale:

ConcurrentDictionary + per‑set locks keeps the hot path cheap.
Avoid global locks; at 10k connections they turn into choke points.

Streaming for large result sets

When you need to push many items, use server‑to‑client streaming to keep memory low and give the client first bytes early.

public sealed class ReportHub : Hub
{
    public async IAsyncEnumerable<ReportRow> StreamReport(
        string reportId,
        [EnumeratorCancellation] CancellationToken ct = default)
    {
        await foreach (var row in LoadRowsAsync(reportId, ct))
        {
            yield return row; // client receives rows as they are ready
        }
    }

    private static async IAsyncEnumerable<ReportRow> LoadRowsAsync(string id, [EnumeratorCancellation] CancellationToken ct)
    {
        for (var i = 0; i < 10_000; i++)
        {
            yield return new ReportRow(i, $"R{i}");
            await Task.Yield();
        }
    }
}

Client

var stream = hubConnection.StreamAsync<ReportRow>("StreamReport", reportId, cancellationToken);
await foreach (var row in stream.WithCancellation(cancellationToken))
{
    // render as items arrive
}

Guard rails: rate limits and quotas

You don’t want one buggy tab to ruin the party.

HubOptions.MaximumReceiveMessageSize: drop huge payloads.
StreamBufferCapacity: limit per‑connection memory during streaming.
MaxBufferedUnacknowledgedRenderBatches: slow render spam to clients that can’t keep up.
Per‑user rate limit (simple counter) inside a hub filter.

Lightweight rate limit with a hub filter

public sealed class SimpleRateLimitFilter : IHubFilter
{
    private static readonly ConcurrentDictionary<string, (int Count, long WindowStart)> _counters = new();
    private const int Limit = 30; // 30 calls
    private static readonly TimeSpan Window = TimeSpan.FromSeconds(10);

    public async ValueTask<object?> InvokeMethodAsync(
        HubInvocationContext context, Func<HubInvocationContext, ValueTask<object?>> next)
    {
        var key = context.Context.UserIdentifier ?? context.Context.ConnectionId;
        var now = Stopwatch.GetTimestamp();
        var windowStartTicks = now - (long)(Window.TotalSeconds * Stopwatch.Frequency);

        _counters.AddOrUpdate(key,
            _ => (1, now),
            (_, v) => v.WindowStart < windowStartTicks ? (1, now) : (v.Count + 1, v.WindowStart));

        var (count, start) = _counters[key];
        if (start >= windowStartTicks && count > Limit)
            throw new HubException("rate limit");

        return await next(context);
    }
}

// Program.cs
builder.Services.AddSingleton<IHubFilter, SimpleRateLimitFilter>();

Blazor Server specifics that matter at scale

Blazor Server has a render queue per circuit. Keep these in check:

MaxBufferedUnacknowledgedRenderBatches: if a browser lags, the server will pause sending renders instead of hoarding them in memory.
DisconnectedCircuitRetentionPeriod and DisconnectedCircuitMaxRetained: let users reconnect after a brief network drop without losing state, but don’t keep thousands of dead circuits.
JSInteropDefaultCallTimeout: stuck JS calls should fail fast.

Also:

Avoid large @foreach renders on every tick. Use Virtualize or diff small parts.
Throttle UI: if you push ticks at 10/sec, render at 2-4/sec and aggregate values.
Minimize StateHasChanged calls; batch updates in a timer.

Azure setup: the safe defaults

1) App Service or containers

Linux plans tend to have fewer surprises with WebSockets.
Enable WebSockets on the App Service.
If you’re not using Azure SignalR Service, keep ARR Affinity ON so circuits stick to the same instance.
Share Data Protection keys across instances (Blob or Key Vault) so auth cookies stay valid after scale‑out.

2) Azure SignalR Service (Default mode)

For big fan‑out and high connection counts, place Azure SignalR Service in front of your app.

Use Default mode for Blazor Server.
Choose a SKU that covers your expected peak connections and messages per day; start modest, autoscale the app, and watch service metrics.
In App Service, ARR Affinity can be OFF when you use Azure SignalR; the service handles routing.
Keep WebSockets allowed end‑to‑end (CDN/Front Door/Application Gateway must pass them through).

appsettings.json

{
  "Azure": {
    "SignalR": {
      "ConnectionString": "Endpoint=...;AccessKey=...;Version=1.0;"
    }
  }
}

Program.cs (when using Azure SignalR Service)

builder.Services.AddSignalR().AddMessagePackProtocol();
// later
app.MapBlazorHub(); // with AddAzureSignalR() if your template uses it in your stack

Keep the app stateless beyond the Blazor circuit. For anything shared across instances, use Redis, SQL, or a durable store.

3) Timeouts & keep‑alives

Most proxies drop idle sockets after a few minutes. Your KeepAliveInterval (15s above) is fine.
ClientTimeoutInterval at ~30s helps prune dead connections when users close laptops.

4) Autoscale rules that actually work

Start with:

CPU at 60-65% over 10 minutes,
Connections per instance (WebSocket connections) threshold that keeps memory comfortable,
Queue length if you use a broker for background jobs.

Scale out before you’re in pain; scale in slowly.

Monitoring: what to watch during a load test

Server

Current connections
Messages/sec (send & receive)
Average hub invocation time
GC pauses and LOH allocations
Exceptions (especially HubException and disconnect reasons)

Client

Reconnect attempts
Mean time to render after a message

How I collect it

Application Insights for logs + custom metrics (track counts on connect/disconnect, queue depth).
EventCounters via dotnet-counters during test runs.
A small /healthz endpoint that returns connection counts and queue sizes.

Sample health endpoint

app.MapGet("/healthz", (PresenceStore p) => Results.Ok(new
{
    Utc = DateTime.UtcNow,
    TickerSubscribers = p.Count("TICKER")
}));

Load test recipe (works on a laptop and scales up)

Spin up a k6 script (or your tool of choice) that opens N WebSockets and keeps them subscribed.
Run for 15 minutes, broadcasting a small DTO to all clients every second.
Record: CPU, memory, connections, send time P95, server exceptions.
Increase by 2× until you hit the limit; note the first bottleneck (CPU, memory, socket caps). Fix that, repeat.

Broadcast loop for tests

var timer = new PeriodicTimer(TimeSpan.FromSeconds(1));
var rnd = new Random();
while (await timer.WaitForNextTickAsync())
{
    var dto = new StockTickDto("ACME", Math.Round((decimal)rnd.NextDouble() * 100, 2), DateTimeOffset.UtcNow.ToUnixTimeMilliseconds());
    queue.TryEnqueue("TICKER", dto);
}

Common pitfalls (I’ve fixed all of these in real apps)

Big JSON everywhere: switch to MessagePack and DTOs.
Hub does I/O or EF per call: enqueue and let a worker do it.
No limits: one slow browser fills server memory with render batches.
ARR Affinity off without Azure SignalR: circuits bounce between instances and die.
Long polling allowed: force WebSockets when possible; long polling is a last resort.
Giant @foreach updates: render diffs, not the world.
No autoscale: a live event arrives and your single instance cries.

Quick checklist before your next spike

MessagePack on
MaximumReceiveMessageSize set
Backpressure on renders (MaxBufferedUnacknowledgedRenderBatches)
Bounded channel for fan‑out
Azure SignalR Service in Default mode (for big scale)
WebSockets enabled end‑to‑end
ARR Affinity set correctly
Autoscale rules active
Health endpoint + counters visible

FAQ: Blazor Server & SignalR at scale

Do I need Azure SignalR Service for 10,000 users?

It’s the safest route. You can reach high numbers without it on big machines, but the service gives better connection density, buffer management, and simpler routing.

Is there such a thing as “connection pooling” for SignalR?

Not in the database sense. Think in terms of connection density and server connections to Azure SignalR. You pool work, not sockets: use a bounded channel and background workers to smooth bursts.

Should I turn on compression?

Compression for WebSockets is separate and not always available across proxies. The more reliable gain is switching to MessagePack and trimming DTOs.

What about HTTP/2 or HTTP/3?

SignalR prefers WebSockets. Keep those healthy and you’ll be fine.

How do I handle reconnect storms (e.g., office Wi‑Fi resets)?

Make the server cheap per connection: short timeouts, strict limits, and a queue for fan‑out. The system should bend, not break.

How do I know when to scale out?

When CPU stays above ~65% for 10 minutes, memory keeps growing, queue depth rises, or send P95 crosses your UX target.

Can I run this on Windows + IIS?

Yes, but check WebSocket limits/timeouts and prefer in‑process hosting. For fewer moving parts at scale I usually pick Linux plans.

Conclusion: Win the crowd with limits, not luck

You don’t need exotic gear to serve 10,000 live users. You need tight limits, small payloads, background fan‑out, and an Azure layout that respects WebSockets. Start with the code in this post, run a load test, and tweak with data — not vibes. If you’ve tried different settings or hit a tricky bottleneck, drop a comment: what did your graphs show when the load doubled?

Post Views: 386