Are your threads quietly corrupting data right now? Stop guessing. In this guide you’ll see the truth about lock, Monitor, Interlocked, ReaderWriterLockSlim, concurrent collections, and real benchmarks that show when each choice wins or loses.
Why read this
You want safe code that stays fast under load. In my projects (high‑throughput APIs and pricing engines) I’ve shipped the wrong approach more than once and paid for it later: hidden race bugs, CPU spikes, and deadlocks that only show up at 03:00. This post gives you a field kit you can apply today.
You will get:
- A clear mental model of race conditions, deadlocks, and livelock.
- Practical patterns with copy‑paste‑ready code.
- A small benchmarking harness you can run on your machine.
- A checklist to pick the right tool without overthinking.
Stop Race Conditions: spot, reproduce, fix
What is a race? Two or more threads touch the same data without proper sync. The order of reads/writes changes the outcome.
Smells you can notice:
- Flaky tests that pass on your laptop and fail in CI.
- Random counters that are off by a few percent.
- Rare NullReferenceExceptionwhere nothing looks null.
Minimal repro (broken):
// Problem: increments get lost under load
class Counter
{
    public int Value; // not thread-safe
    public void Increment() => Value++; // read, add, write (not atomic)
}Fix #1 – lock it:
class LockedCounter
{
    private int _value;
    private readonly object _gate = new();
    public void Increment()
    {
        lock (_gate)
        {
            _value++;
        }
    }
    public int Read() { lock (_gate) return _value; }
}Fix #2 – use Interlocked:
class AtomicCounter
{
    private int _value;
    public void Increment() => Interlocked.Increment(ref _value);
    public int Read() => Volatile.Read(ref _value);
}When to pick which?
- Interlockedfor simple numeric state (- int,- long) or swap/publish of a reference.
- lockwhen you mutate multiple fields together or must keep an invariant across steps.
Tip: make races loud. In tests, run your code in a tight loop with
Parallel.Forand random delays. If it’s wrong, it will scream.
Master the lock statement (and Monitor under the hood)
lock(obj) compiles to Monitor.Enter(obj) + try/finally + Monitor.Exit(obj).
Rules that keep you safe:
- Never lock on thisor a public object. Use a privatereadonlygate.
- Keep critical sections tiny. Do only the work that needs sync.
- Don’t call out (I/O, await, or callbacks) while holding a lock.
Example – safe queue wrapper:
class SafeQueue<T>
{
    private readonly Queue<T> _q = new();
    private readonly object _gate = new();
    public void Enqueue(T item)
    {
        lock (_gate) _q.Enqueue(item);
    }
    public bool TryDequeue(out T? item)
    {
        lock (_gate)
        {
            if (_q.Count == 0) { item = default; return false; }
            item = _q.Dequeue();
            return true;
        }
    }
}Using Monitor directly gives you timeouts and try‑enter:
if (Monitor.TryEnter(_gate, millisecondsTimeout: 5))
{
    try { /* critical work */ }
    finally { Monitor.Exit(_gate); }
}
else
{
    // back off, queue work, or log
}Use timeouts to kill deadlocks early in non‑critical paths and add metrics around it.
The Truth about Interlocked: tiny ops, huge wins
Interlocked gives atomic read‑modify‑write on primitive fields and references.
Great hits:
- Increment/Decrementfor counters.
- Exchangeto publish a new instance safely.
- CompareExchange(CAS) to build lock‑free structures.
Publish‑once pattern:
class ConfigCache
{
    private Config? _current;
    public void Publish(Config cfg)
        => Interlocked.Exchange(ref _current, cfg);
    public Config? Snapshot()
        => Volatile.Read(ref _current);
}Gotchas:
- On 32‑bit you must use longwith care; .NET makesInterlocked.Read(ref long)safe, but plain reads aren’t atomic.
- Memory ordering matters. Use Volatile.Read/Writewhen you share state without locks to make intent clear.
Kill Contention with ReaderWriterLockSlim
When reads are huge and writes are rare, a simple lock becomes a traffic jam. ReaderWriterLockSlim lets many readers proceed together while writers get exclusivity.
Example – read‑heavy cache:
class ProductCache
{
    private readonly Dictionary<int, string> _data = new();
    private readonly ReaderWriterLockSlim _rw = new(LockRecursionPolicy.NoRecursion);
    public string? Find(int id)
    {
        _rw.EnterReadLock();
        try { return _data.TryGetValue(id, out var v) ? v : null; }
        finally { _rw.ExitReadLock(); }
    }
    public void Put(int id, string name)
    {
        _rw.EnterWriteLock();
        try { _data[id] = name; }
        finally { _rw.ExitWriteLock(); }
    }
}Use it when: reads outweigh writes by at least 4‑5× and the read section is small.
Avoid when: you might reenter, or you hold the lock around I/O.
Master the Concurrent Collections
The BCL gives high‑quality thread‑safe containers. Prefer them to home‑grown locks.
- ConcurrentDictionary<TKey,TValue>– fast lookups, atomic add/update with- GetOrAdd,- AddOrUpdate.
- ConcurrentQueue<T>/- ConcurrentStack<T>– non‑blocking queues/stacks.
- BlockingCollection<T>– producer/consumer with bounding and blocking on- Take. Great for simple pipelines.
- Channel<T>(from- System.Threading.Channels) – async producer/consumer with backpressure, works nicely with- async/await.
Atomic update example:
var hits = new ConcurrentDictionary<string, int>();
void Count(string key)
{
    hits.AddOrUpdate(key, 1, (_, old) => old + 1);
}Bounded work queue:
var channel = Channel.CreateBounded<string>(capacity: 100);
// producer
_ = Task.Run(async () =>
{
    foreach (var line in File.ReadLines(path))
        await channel.Writer.WriteAsync(line);
    channel.Writer.Complete();
});
// consumer
await foreach (var line in channel.Reader.ReadAllAsync())
{
    Process(line);
}The Truth about Lock‑Free: fast, sharp, and not for every case
Lock‑free code uses CAS loops (CompareExchange) to update state without blocking. It shines under heavy contention, but it’s easy to cut yourself.
Treiber stack (simplified):
public sealed class LockFreeStack<T>
{
    private sealed class Node { public T Item = default!; public Node? Next; }
    private Node? _head; // shared
    public void Push(T item)
    {
        var node = new Node { Item = item };
        while (true)
        {
            var head = Volatile.Read(ref _head);
            node.Next = head;
            if (Interlocked.CompareExchange(ref _head, node, head) == head)
                return; // success
            // CAS failed – someone changed head, retry
            Thread.SpinWait(1);
        }
    }
    public bool TryPop(out T? result)
    {
        while (true)
        {
            var head = Volatile.Read(ref _head);
            if (head is null) { result = default; return false; }
            var next = head.Next;
            if (Interlocked.CompareExchange(ref _head, next, head) == head)
            { result = head.Item; return true; }
            Thread.SpinWait(1);
        }
    }
}Risks:
- ABA problem on references under extreme patterns (rare in GC’d apps but real for advanced structures).
- Fairness: starved threads can spin forever if you design it poorly.
- Debug pain: state changes without a lock are harder to trace.
Rule of thumb: use library collections first. Use lock‑free only for hot spots you can prove with numbers.
Stop Deadlocks and Livelock
Deadlock: two locks taken in opposite order. Both wait forever.
Kill it with ordering: define a global order and always lock A then B.
void Transfer(Account a, Account b, decimal amount)
{
    var first = a.Id < b.Id ? a : b;
    var second = a.Id < b.Id ? b : a;
    lock (first.Gate)
    lock (second.Gate)
    {
        a.Withdraw(amount);
        b.Deposit(amount);
    }
}Timeouts save you: prefer Monitor.TryEnter with a short timeout on non‑critical paths; log and back off.
Livelock: everyone keeps moving but no progress (e.g., all threads retry at once).
Fix livelock with backoff + jitter:
static void Backoff(int attempt)
{
    var delay = Math.Min(1 << Math.Min(attempt, 20), 1_000); // cap at 1s
    Thread.Sleep(Random.Shared.Next(delay));
}Use Backoff(++attempt) inside your CAS retry loop after a few failed tries.
Kill False Sharing
Two hot fields on the same cache line can fight each other across cores.
Fix: pad or separate them. .NET has System.Runtime.CompilerServices.SkipLocalsInitAttribute and StructLayout(LayoutKind.Explicit) tricks, but the simplest is to split objects or use System.Runtime.InteropServices.StructLayout with care.
For counters used by multiple producers, try sharded counters:
class ShardedCounter
{
    private readonly int[] _cells = new int[Environment.ProcessorCount * 2];
    public void Increment()
    {
        var idx = Thread.GetCurrentProcessorId() & (_cells.Length - 1);
        Interlocked.Increment(ref _cells[idx]);
    }
    public int Sum()
    {
        var total = 0;
        foreach (var c in _cells)
            total += Volatile.Read(ref Unsafe.AsRef(in c));
        return total;
    }
}The Benchmark You Can Copy
I like to test on my box before making promises. Here is a BenchmarkDotNet harness that compares common sync choices for a simple counter scenario.
// <Project Sdk="Microsoft.NET.Sdk">
//   <PropertyGroup>
//     <TargetFramework>net9.0</TargetFramework>
//     <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
//   </PropertyGroup>
//   <ItemGroup>
//     <PackageReference Include="BenchmarkDotNet" Version="0.13.12" />
//   </ItemGroup>
// </Project>
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Concurrent;
using System.Threading.Channels;
[MemoryDiagnoser]
[ThreadingDiagnoser]
public class SyncBench
{
    [Params(1, 4, 8, 16)] public int Threads;
    [Params(1_0000)] public int OpsPerThread;
    private object _gate = new();
    private int _x;
    [Benchmark(Baseline = true)]
    public int InterlockedIncrement()
    {
        _x = 0;
        Parallel.For(0, Threads, _ =>
        {
            for (int i = 0; i < OpsPerThread; i++)
                Interlocked.Increment(ref _x);
        });
        return _x;
    }
    [Benchmark]
    public int LockedIncrement()
    {
        _x = 0;
        Parallel.For(0, Threads, _ =>
        {
            for (int i = 0; i < OpsPerThread; i++)
                lock (_gate) _x++;
        });
        return _x;
    }
    [Benchmark]
    public int RWLockIncrement()
    {
        var rw = new ReaderWriterLockSlim();
        _x = 0;
        Parallel.For(0, Threads, _ =>
        {
            for (int i = 0; i < OpsPerThread; i++)
            {
                rw.EnterWriteLock();
                _x++;
                rw.ExitWriteLock();
            }
        });
        return _x;
    }
    [Benchmark]
    public int ConcurrentDictionaryAddOrUpdate()
    {
        var cd = new ConcurrentDictionary<int, int>();
        cd[0] = 0;
        Parallel.For(0, Threads, _ =>
        {
            for (int i = 0; i < OpsPerThread; i++)
                cd.AddOrUpdate(0, 1, (_, v) => v + 1);
        });
        return cd[0];
    }
    public static void Main() => BenchmarkRunner.Run<SyncBench>();
}Sample results from one laptop (Ryzen 7, .NET 9, Release, 16 threads):
| Method | Mean (ns/op) | Ratio | 
|---|---|---|
| InterlockedIncrement | 7.8 | 1.00 | 
| LockedIncrement | 38.5 | 4.9× | 
| RWLockIncrement | 65.2 | 8.4× | 
| ConcurrentDictionaryAddOrUpdate | 92.3 | 11.8× | 
Your numbers will differ, but the shape is stable: Interlocked is cheapest, a plain lock is next, then heavier tools.
Benchmark tips: run on Release, no debugger, CPU governor on Performance, and close background apps.
Stop Guessing: a short decision guide
- Need to update a single numeric or swap a reference? Use Interlocked.
- Need to change related fields together? Use lockwith a private gate.
- Read‑heavy dictionary or cache? Try ReaderWriterLockSlimorConcurrentDictionary.
- Producer/consumer? Use Channel<T>orBlockingCollection<T>.
- Crazy contention on a tiny hot path? Consider lock‑free, but only with tests and metrics.
Debugging: fast ways to see what’s wrong
- Parallel Stacks / Tasks window in Visual Studio shows who is waiting where. Look for many threads stuck inside the same method.
- Dump + !syncblk(WinDbg) shows held locks.
- EventPipe tools: dotnet-traceanddotnet-counterscan showThreadPoolstarvation and GC pauses.
- Log lock waits: wrap Monitor.TryEnterwith timing and log slow entries.
Tiny helper to log slow lock entries:
static IDisposable TimedEnter(object gate, int thresholdMs, string name)
{
    var sw = ValueStopwatch.StartNew();
    Monitor.Enter(gate);
    var waited = sw.ElapsedMilliseconds;
    if (waited > thresholdMs)
        Console.WriteLine($"WARN lock {name} took {waited}ms to enter");
    return new ActionOnDispose(() => Monitor.Exit(gate));
}
readonly struct ValueStopwatch
{
    private readonly long _start;
    private ValueStopwatch(long start) => _start = start;
    public static ValueStopwatch StartNew() => new(Environment.TickCount64);
    public long ElapsedMilliseconds => Environment.TickCount64 - _start;
}
sealed class ActionOnDispose : IDisposable
{
    private readonly Action _a;
    public ActionOnDispose(Action a) => _a = a;
    public void Dispose() => _a();
}Usage:
using (TimedEnter(_gate, thresholdMs: 10, name: nameof(MyType)))
{
    // protected section
}FAQ: quick answers that save you time
await inside a lock?No. lock is sync only. Use SemaphoreSlim for async flows.
ConcurrentDictionary always faster than a locked Dictionary?Not always. For single‑threaded or low contention, a plain Dictionary with a simple lock can win.
volatile on fields protected by lock?No. Enter/Exit already include the right memory fences.
ReaderWriterLockSlim?If writers are frequent, readers get blocked and spin. In that case switch to a simple lock.
Only for tiny hot spots you can prove with benchmarks. Library types already cover most needs.
Define a lock order, keep scopes small, add timeouts, and never call external code while holding a lock.
Conclusion: safer threads, faster code
You don’t need magic, you need the right tool for the job and numbers to back it. Start with simple tools (Interlocked, lock, concurrent collections). Add ReaderWriterLockSlim only when reads truly dominate. Keep lock scopes tiny. Add backoff to CAS loops. Measure. Repeat.
Pick one hot path in your app this week, wire up the benchmark harness, and post your results. I’ll answer questions and help tune them.

