Build an MCP Server to Run BenchmarkDotNet from Chat

Ever wished you could say “run the perf suite” in chat and get a clean JSON diff of your .NET benchmarks a few seconds later? That’s exactly what I built: a tiny MCP server that orchestrates BenchmarkDotNet, produces artifacts (JSON/MD/CSV), and even compares runs. In this post I’ll show you how I wired it up end‑to‑end, what went wrong (and how I fixed it), and give you copy‑pasteable snippets to ship your own.

What we’re building (and why)

Goal: expose a Model Context Protocol (MCP) server over stdio with tools that:

Run a benchmark executable (BenchmarkDotNet) with filters/jobs/exporters
Return a runId and a brief JSON summary
List the generated artifacts (JSON/MD/CSV/logs)
Compare two runs and return perf deltas

Why bother? Because it turns performance checks into a chat‑first workflow. I can run, fetch results, and compare… without leaving Visual Studio, MCP Inspector, or Copilot Chat. It’s the developer equivalent of a one‑button coffee machine.

Tech stack: .NET 9 (works with .NET 8 too), C#, BenchmarkDotNet, System.CommandLine, the C# MCP server SDK, and plain old ProcessStartInfo.

High‑level architecture

Here’s the flow my server follows:

Copilot / MCP Inspector
        │
        ▼
MCP Server (stdio)
  ├─ Tool: run_bench → spawns `dotnet run` on SampleBenchmarks
  ├─ Tool: get_results → parses JSON summary from artifacts
  ├─ Tool: list_artifacts → enumerates files for a runId
  └─ Tool: compare_runs → computes deltas between two JSON summaries
        │
        ▼
BenchmarkDotNet app (separate project)
  └─ Writes artifacts (JSON/MD/CSV) per runId folder

Separation of concerns: the MCP server orchestrates; the benchmark app measures. Results are exchanged via files (JSON/MD/CSV) so the boundary is simple and debuggable.

Solution layout

Mcp.BenchmarkRunner.sln
│
├─ src/
│  ├─ Mcp.BenchmarkRunner.Server/        # MCP server (console)
│  │  ├─ Program.cs                      # host + stdio + tool discovery
│  │  └─ Tools/
│  │     ├─ BenchTools.cs                # run_bench, get_results, list_artifacts
│  │     └─ CompareTools.cs              # compare_runs
│  │
│  └─ SampleBenchmarks/                  # BenchmarkDotNet console app
│     ├─ Program.cs                      # CLI (artifacts/exporters/job/filter)
│     ├─ Jobs.cs                         # Short/Medium/Long presets
│     └─ HashBench.cs                    # simple MD5 vs SHA256
│
└─ runs/                                 # per-runId artifacts (created at runtime)

Tip: I keep the benchmark app as a sibling project so the server can spawn it with dotnet run and not worry about shipping assemblies.

Bootstrapping the MCP server

The host is intentionally minimal. I let the SDK auto‑discover tools from the assembly so adding a new tool is just creating a static method with an attribute.

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using ModelContextProtocol.Server;

var builder = Host.CreateApplicationBuilder(args);
builder.Logging.AddConsole(o => o.LogToStandardErrorThreshold = LogLevel.Information);

// Register MCP server with stdio (works great with MCP Inspector/Copilot)
builder.Services
    .AddMcpServer()
    .WithStdioServerTransport()
    .WithToolsFromAssembly(); // scans for [McpServerToolType]/[McpServerTool]

var app = builder.Build();
await app.RunAsync();

That one line WithToolsFromAssembly() is the MVP move. It turns the server into a plugboard: drop in tools, rebuild, done.

The benchmark tool contract

I chose request/response types that map cleanly onto the BenchmarkDotNet CLI. You can adapt these to your domain.

// DTOs.cs
    public record RunBenchRequest(
        [Description("BenchmarkDotNet glob filter, e.g. *Hash*")] string? Filter = null,
        [Description("Job preset: Short|Medium|Long")] string Job = "Medium",
        [Description("Exporters: json,fulljson,md,csv")] string Exporters = "json,md",
        [Description("Timeout in seconds")] int TimeoutSec = 600
    );

public record RunBenchResponse(
    string RunId,
    string ArtifactsDir,
    object? Summary
);

public record GetResultsResponse(
    string RunId,
    object? JsonSummary
);

Paths and run folders (where most bugs hide)

When the server is launched from different places (Inspector vs Visual Studio), AppContext.BaseDirectory can vary. I made path resolution deterministic by walking up to the repo root and composing relative paths from there.

// Paths.cs
public static class Paths
{
    public static readonly string RepoRoot = FindRepoRoot();
    public static readonly string BenchProj = Path.Combine(RepoRoot, "src", "SampleBenchmarks", "SampleBenchmarks.csproj");
    public static readonly string RunsRoot  = Path.Combine(RepoRoot, "runs");

    private static string FindRepoRoot()
    {
        var dir = new DirectoryInfo(AppContext.BaseDirectory);
        while (dir is not null)
        {
            if (File.Exists(Path.Combine(dir.FullName, "Mcp.BenchmarkRunner.sln")) ||
                Directory.Exists(Path.Combine(dir.FullName, "src")))
                return dir.FullName;
            dir = dir.Parent!;
        }
        return AppContext.BaseDirectory; // fallback
    }
}

Debug trick: expose a diag_env tool that returns these paths plus Environment.CurrentDirectory. It eliminates 80% of “works on my machine” issues.

Implementing `run_bench`

The tool generates a runId, spawns the benchmark process, waits (with timeout), parses the JSON exporter, and returns a compact summary.

// Tools/BenchTools.cs (excerpt)
using System.Diagnostics;
using System.Text.Json;
using ModelContextProtocol.Server; // attributes

public static class BenchTools
{
    [McpServerTool, Description("Run BenchmarkDotNet with optional filter and job preset; returns runId and brief summary.")]
    public static async Task<RunBenchResponse> run_bench(RunBenchRequest req)
    {
        Directory.CreateDirectory(RunsRoot);
        var runId = DateTimeOffset.UtcNow.ToString("yyyyMMdd_HHmmss_fff");
        var artifactsDir = Path.Combine(RunsRoot, runId);
        Directory.CreateDirectory(artifactsDir);

        var psi = new ProcessStartInfo
        {
            FileName = "dotnet",
            WorkingDirectory = SolutionRoot,
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false,
            // dotnet run --project SampleBenchmarks -- --artifacts <dir> --exporters <...> --job <...> [--filter <...>]
            ArgumentList = { "run", "--project", SampleProj, "-c", "Release", "--",
            "--artifacts", artifactsDir,
            "--exporters", req.Exporters,
            "--job", req.Job }
        };

        if (!string.IsNullOrWhiteSpace(req.Filter))
        {
            psi.ArgumentList.Add("--filter");
            psi.ArgumentList.Add(req.Filter!);
        }

        using var proc = Process.Start(psi)!;

        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(req.TimeoutSec));
        var stdoutTask = proc.StandardOutput.ReadToEndAsync(cts.Token);
        var stderrTask = proc.StandardError.ReadToEndAsync(cts.Token);

        if (!proc.WaitForExit((int)TimeSpan.FromSeconds(req.TimeoutSec).TotalMilliseconds))
        {
            try { proc.Kill(entireProcessTree: true); } catch { }
            throw new TimeoutException($"Benchmark run timed out after {req.TimeoutSec}s");
        }

        var stdout = await stdoutTask;
        var stderr = await stderrTask;

        if (proc.ExitCode != 0)
            throw new ApplicationException($"Benchmark process failed:\n{stderr}\n{stdout}");

        var json = Directory.EnumerateFiles(artifactsDir, "*.json", SearchOption.AllDirectories)
                            .OrderByDescending(File.GetLastWriteTimeUtc)
                            .Select(File.ReadAllText)
                            .FirstOrDefault();

        object? summary = null;
        try { summary = json is null ? null : JsonSerializer.Deserialize<object>(json); } catch { }

        return new RunBenchResponse(runId, artifactsDir, summary);
    }
}

Two important choices here:

Artifacts as the contract: the JSON/MD/CSV files are the API between my benchmark app and the MCP server. It’s robust and easy to inspect.
Short summary shape: MCP tools should return compact payloads; clients can always fetch the full artifact if needed.

Listing artifacts and getting results

These are straightforward wrappers around Directory.EnumerateFiles and a thin JSON parse returning the last exporter file. Keeping them separate lets clients tailor UI (e.g., open the Markdown report vs parse JSON).

[McpServerTool, Description("List artifact files for a given runId (JSON/MD/CSV/Log).")]
public static ListArtifactsResponse list_artifacts(ListArtifactsRequest req)
{
    var dir = Path.Combine(RunsRoot, req.RunId);
    if (!Directory.Exists(dir)) throw new DirectoryNotFoundException(dir);

    var files = Directory.EnumerateFiles(dir, "*", SearchOption.AllDirectories).ToArray();
    return new ListArtifactsResponse(req.RunId, files);
}

[McpServerTool, Description("Return parsed JSON summary for a given runId.")]
public static GetResultsResponse get_results(GetResultsRequest req)
{
    var dir = Path.Combine(RunsRoot, req.RunId);
    if (!Directory.Exists(dir)) throw new DirectoryNotFoundException(dir);

    var jsonPath = Directory.EnumerateFiles(dir, "*.json", SearchOption.AllDirectories)
                            .OrderByDescending(File.GetLastWriteTimeUtc)
                            .FirstOrDefault();

    object? obj = jsonPath is null ? null : JsonSerializer.Deserialize<object>(File.ReadAllText(jsonPath));
    return new GetResultsResponse(req.RunId, obj);
}

Comparing two runs

This is where the MCP demo really “pops” in chat: I can ask, “compare run A to run B,” and get percentages per benchmark.

public record CompareRequest(
    [Description("Base runId")] string BaseRunId,
    [Description("Head runId")] string HeadRunId);

public record DiffRow(string Benchmark, double? MeanBase, double? MeanHead, double? MeanDeltaPct,
    double? AllocBase, double? AllocHead, double? AllocDeltaPct);

public record CompareResponse(string BaseRunId, string HeadRunId, DiffRow[] Rows);

[McpServerTool, Description("Compare two runs and return deltas for common benchmarks (Mean, AllocatedBytes/Op).")]
public static CompareResponse compare_runs(CompareRequest req)
{
    var baseJson = LoadJson(req.BaseRunId);
    var headJson = LoadJson(req.HeadRunId);

    var baseMap = IndexByTitle(baseJson);
    var headMap = IndexByTitle(headJson);

    var keys = baseMap.Keys.Intersect(headMap.Keys).OrderBy(k => k);
    var rows = new List<DiffRow>();

    foreach (var k in keys)
    {
        var b = baseMap[k];
        var h = headMap[k];

        double? meanB = TryGet(b, "Statistics", "Mean");
        double? meanH = TryGet(h, "Statistics", "Mean");
        double? meanPct = (meanB.HasValue && meanH.HasValue && meanB != 0)
            ? (meanH / meanB - 1.0) * 100.0 : null;

        double? allocB = TryGet(b, "Memory", "AllocatedBytes/Op");
        double? allocH = TryGet(h, "Memory", "AllocatedBytes/Op");
        double? allocPct = (allocB.HasValue && allocH.HasValue && allocB != 0)
            ? (allocH / allocB - 1.0) * 100.0 : null;

        rows.Add(new DiffRow(k, meanB, meanH, meanPct, allocB, allocH, allocPct));
    }

    return new CompareResponse(req.BaseRunId, req.HeadRunId, rows.ToArray());
}

The BenchmarkDotNet app (CLI)

I kept the benchmark app small and CLI‑first so the MCP server remains a thin orchestrator.

// SampleBenchmarks/Program.cs
using System.CommandLine;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Exporters.Csv;
using BenchmarkDotNet.Exporters.Json;
using BenchmarkDotNet.Loggers;
using BenchmarkDotNet.Running;

var artifacts = new Option<string>("--artifacts", () => "BenchmarkDotNet.Artifacts", "Artifacts output dir");
var exporters = new Option<string>("--exporters", () => "json,md", "json,fulljson,md,csv");
var filter    = new Option<string?>("--filter", "Glob filter: e.g. *Hash*");
var job       = new Option<string>("--job", () => "Medium", "Short|Medium|Long");

var root = new RootCommand("Sample Benchmarks");
root.AddOption(artifacts); root.AddOption(exporters); root.AddOption(filter); root.AddOption(job);

root.SetHandler(async (string art, string exp, string? f, string jobName) =>
{
    var cfg = ManualConfig.CreateEmpty()
        .AddLogger(ConsoleLogger.Default)
        .WithArtifactsPath(art);

    cfg = Jobs.ApplyPreset(cfg, jobName);

    foreach (var e in exp.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
    {
        switch (e.ToLowerInvariant())
        {
            case "json":     cfg.AddExporter(JsonExporter.Default); break;
            case "fulljson": cfg.AddExporter(JsonExporter.Full); break;
            case "md":       cfg.AddExporter(MarkdownExporter.GitHub); break;
            case "csv":      cfg.AddExporter(CsvExporter.Default); break;
        }
    }

    var switcher = new BenchmarkSwitcher(new[] { typeof(HashBench) });
    await Task.Run(() => switcher.Run(
        args: f is null ? Array.Empty<string>() : new[] { $"--filter={f}" }, config: cfg));
}, artifacts, exporters, filter, job);

return await root.InvokeAsync(args);

A tiny Jobs helper keeps presets readable:

// Jobs.cs
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Jobs;

public static class Jobs
{
    public static ManualConfig ApplyPreset(ManualConfig cfg, string jobName)
    {
        cfg.AddDiagnoser(MemoryDiagnoser.Default);

        var job = jobName switch
        {
            "Short"  => Job.ShortRun,
            "Long"   => Job.Default.WithIterationCount(20).WithWarmupCount(5),
            _        => Job.MediumRun
        };

        return cfg.AddJob(job);
    }
}

And a very visible, low‑noise benchmark to prove it works:

// HashBench.cs
using System.Security.Cryptography;
using System.Text;
using BenchmarkDotNet.Attributes;

[MemoryDiagnoser]
public class HashBench
{
    private readonly byte[] _data = Encoding.UTF8.GetBytes("The quick brown fox jumps over the lazy dog.");

    [Benchmark] public byte[] Md5()    => MD5.HashData(_data);
    [Benchmark] public byte[] Sha256() => SHA256.HashData(_data);
}

Build, run, and sanity‑check

# 1) Build everything
 dotnet restore
 dotnet build

# 2) Sanity test the benchmark app directly (no MCP)
 dotnet run --project src/SampleBenchmarks/SampleBenchmarks.csproj -c Release -- \
   --artifacts runs/manual1 --exporters json,md --job Short --filter *Hash*

# 3) Start the MCP server
 dotnet run --project src/Mcp.BenchmarkRunner.Server/Mcp.BenchmarkRunner.Server.csproj

If step 2 produces JSON/MD under runs/manual1, you’re golden.

Using MCP Inspector (standalone UI)

I use Inspector to iterate on tool shapes quickly.

Command: dotnet
Args: run --project src/Mcp.BenchmarkRunner.Server/Mcp.BenchmarkRunner.Server.csproj
Example payloads: { "filter": "*Hash*", "job": "Short", "exporters": "json,md" } Then call list_artifacts and get_results with the returned runId.

Windows npm gotcha: if npx @modelcontextprotocol/inspector fails with an ENOENT for %APPDATA%\npm, create the folder, set npm config set prefix "%APPDATA%\npm", add it to PATH, and retry.

Using it inside Visual Studio (Copilot Chat Tools)

Create a .mcp.json either in the solution root (preferable for relative paths) or at %USERPROFILE%\ with absolute paths.

Project‑local .mcp.json

{
  "servers": {
    "benchrunner": {
      "name": "Benchmark Runner",
      "type": "stdio",
      "command": "dotnet",
      "args": [
        "run",
        "--project",
        "src/Mcp.BenchmarkRunner.Server/Mcp.BenchmarkRunner.Server.csproj"
      ]
    }
  }
}

Then in GitHub Copilot Chat → switch to Tools/Agent → enable Benchmark Runner → invoke:

run_bench → copy runId
get_results / list_artifacts
compare_runs with two runIds to see mean/alloc deltas

Where to read logs: View → Output → GitHub Copilot.

Troubleshooting (things that bit me)

dotnet restore fails: ensure the nuget.org source is enabled, clear caches (dotnet nuget locals all --clear), allow preview packages if your MCP SDK is prerelease, or temporarily switch TFM to net8.0.
Relative path chaos: launch contexts differ; walk up to repo root (see FindRepoRoot()) and compute absolute paths from there.
Inspector ENOENT on Windows: create %APPDATA%\npm, set npm prefix, add to PATH.
Timeouts: long Long jobs can exceed your tool timeout; make timeoutSec configurable and clamp within reason.
Artifacts missing: BDN writes to BenchmarkDotNet.Artifacts by default – ensure you pass --artifacts and that the folder exists.

Design choices & lessons learned

stdio transport is enough. Fancy transports can wait; the Inspector/Copilot combo already speaks stdio nicely.
Artifacts as API makes debugging delightful. If a run fails, I can open the Markdown report or logs without touching the server code.
Small surface area: tools do one thing each. compare_runs doesn’t pretend to be a dashboard; it just returns rows that any client can render.
Deterministic paths beat cleverness. Avoid “current directory” assumptions.
DX helpers matter. Little tools like diag_env and diag_build remove guesswork when something fails on CI or a teammate’s machine.

FAQ: Building and Running the MCP Benchmark Server

Do I need .NET 9?

No. Targeting net8.0 works fine; I only used APIs available in both.

Why not have the MCP server run benchmarks in‑process?

Isolation and stability. A separate process keeps the server snappy and avoids choking on BDN’s heavy setup.

How do I add my own benchmarks?

Add your types to the BenchmarkSwitcher array and optionally extend the CLI with more switches (e.g., --disasm for DisassemblyDiagnoser).

Can I compare runs across different filters?

Yes, but compare_runs only matches benchmark names common to both runs.

What about CI?

Wrap the server with a simple script that calls run_bench, uploads artifacts, and posts compare_runs deltas as a PR comment.

Is JSON parsing brittle?

I parse only a tiny subset (names, mean, alloc). If you need more, prefer the Full JSON exporter and evolve your DTOs.

How big can runs get?

For local workflows, JSON/MD/CSV are fine. For massive suites, consider zipping artifacts per runId and streaming them.

Conclusion: Chat‑first performance checks for your .NET code

You don’t need a monolith dashboard to track performance. A tiny MCP server, a tiny BenchmarkDotNet app, and a few JSON files give you a chat‑native perf workflow you’ll actually use. Start by copying the host, wire up run_bench, and prove it with HashBench. Once that’s humming, add your real benchmarks and wire compare_runs into your daily chat rituals.

What tool would you add next – upload_project, disasm, or a profile_hotpath? Tell me in the comments and I’ll build the most requested one.

Useful links & repository

Repository (public): Mcp.BenchmarkRunner
BenchmarkDotNet docs: Overview • Exporters • GitHub
Model Context Protocol (MCP): Official Documentation • Specification • Inspector (npm) • Sample servers
ModelContextProtocol (NuGet): NuGet • GitHub

Post Views: 313