Ever wished you could say “run the perf suite” in chat and get a clean JSON diff of your .NET benchmarks a few seconds later? That’s exactly what I built: a tiny MCP server that orchestrates BenchmarkDotNet, produces artifacts (JSON/MD/CSV), and even compares runs. In this post I’ll show you how I wired it up end‑to‑end, what went wrong (and how I fixed it), and give you copy‑pasteable snippets to ship your own.
What we’re building (and why)
Goal: expose a Model Context Protocol (MCP) server over stdio with tools that:
- Run a benchmark executable (BenchmarkDotNet) with filters/jobs/exporters
- Return a runId and a brief JSON summary
- List the generated artifacts (JSON/MD/CSV/logs)
- Compare two runs and return perf deltas
Why bother? Because it turns performance checks into a chat‑first workflow. I can run, fetch results, and compare… without leaving Visual Studio, MCP Inspector, or Copilot Chat. It’s the developer equivalent of a one‑button coffee machine.
Tech stack: .NET 9 (works with .NET 8 too), C#, BenchmarkDotNet, System.CommandLine, the C# MCP server SDK, and plain old ProcessStartInfo
.
High‑level architecture
Here’s the flow my server follows:
Copilot / MCP Inspector
│
▼
MCP Server (stdio)
├─ Tool: run_bench → spawns `dotnet run` on SampleBenchmarks
├─ Tool: get_results → parses JSON summary from artifacts
├─ Tool: list_artifacts → enumerates files for a runId
└─ Tool: compare_runs → computes deltas between two JSON summaries
│
▼
BenchmarkDotNet app (separate project)
└─ Writes artifacts (JSON/MD/CSV) per runId folder
Separation of concerns: the MCP server orchestrates; the benchmark app measures. Results are exchanged via files (JSON/MD/CSV) so the boundary is simple and debuggable.
Solution layout
Mcp.BenchmarkRunner.sln
│
├─ src/
│ ├─ Mcp.BenchmarkRunner.Server/ # MCP server (console)
│ │ ├─ Program.cs # host + stdio + tool discovery
│ │ └─ Tools/
│ │ ├─ BenchTools.cs # run_bench, get_results, list_artifacts
│ │ └─ CompareTools.cs # compare_runs
│ │
│ └─ SampleBenchmarks/ # BenchmarkDotNet console app
│ ├─ Program.cs # CLI (artifacts/exporters/job/filter)
│ ├─ Jobs.cs # Short/Medium/Long presets
│ └─ HashBench.cs # simple MD5 vs SHA256
│
└─ runs/ # per-runId artifacts (created at runtime)
Tip: I keep the benchmark app as a sibling project so the server can spawn it with
dotnet run
and not worry about shipping assemblies.
Bootstrapping the MCP server
The host is intentionally minimal. I let the SDK auto‑discover tools from the assembly so adding a new tool is just creating a static method with an attribute.
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using ModelContextProtocol.Server;
var builder = Host.CreateApplicationBuilder(args);
builder.Logging.AddConsole(o => o.LogToStandardErrorThreshold = LogLevel.Information);
// Register MCP server with stdio (works great with MCP Inspector/Copilot)
builder.Services
.AddMcpServer()
.WithStdioServerTransport()
.WithToolsFromAssembly(); // scans for [McpServerToolType]/[McpServerTool]
var app = builder.Build();
await app.RunAsync();
That one line WithToolsFromAssembly()
is the MVP move. It turns the server into a plugboard: drop in tools, rebuild, done.
The benchmark tool contract
I chose request/response types that map cleanly onto the BenchmarkDotNet CLI. You can adapt these to your domain.
// DTOs.cs
public record RunBenchRequest(
[Description("BenchmarkDotNet glob filter, e.g. *Hash*")] string? Filter = null,
[Description("Job preset: Short|Medium|Long")] string Job = "Medium",
[Description("Exporters: json,fulljson,md,csv")] string Exporters = "json,md",
[Description("Timeout in seconds")] int TimeoutSec = 600
);
public record RunBenchResponse(
string RunId,
string ArtifactsDir,
object? Summary
);
public record GetResultsResponse(
string RunId,
object? JsonSummary
);
Paths and run folders (where most bugs hide)
When the server is launched from different places (Inspector vs Visual Studio), AppContext.BaseDirectory
can vary. I made path resolution deterministic by walking up to the repo root and composing relative paths from there.
// Paths.cs
public static class Paths
{
public static readonly string RepoRoot = FindRepoRoot();
public static readonly string BenchProj = Path.Combine(RepoRoot, "src", "SampleBenchmarks", "SampleBenchmarks.csproj");
public static readonly string RunsRoot = Path.Combine(RepoRoot, "runs");
private static string FindRepoRoot()
{
var dir = new DirectoryInfo(AppContext.BaseDirectory);
while (dir is not null)
{
if (File.Exists(Path.Combine(dir.FullName, "Mcp.BenchmarkRunner.sln")) ||
Directory.Exists(Path.Combine(dir.FullName, "src")))
return dir.FullName;
dir = dir.Parent!;
}
return AppContext.BaseDirectory; // fallback
}
}
Debug trick: expose a
diag_env
tool that returns these paths plusEnvironment.CurrentDirectory
. It eliminates 80% of “works on my machine” issues.
Implementing run_bench
The tool generates a runId
, spawns the benchmark process, waits (with timeout), parses the JSON exporter, and returns a compact summary.
// Tools/BenchTools.cs (excerpt)
using System.Diagnostics;
using System.Text.Json;
using ModelContextProtocol.Server; // attributes
public static class BenchTools
{
[McpServerTool, Description("Run BenchmarkDotNet with optional filter and job preset; returns runId and brief summary.")]
public static async Task<RunBenchResponse> run_bench(RunBenchRequest req)
{
Directory.CreateDirectory(RunsRoot);
var runId = DateTimeOffset.UtcNow.ToString("yyyyMMdd_HHmmss_fff");
var artifactsDir = Path.Combine(RunsRoot, runId);
Directory.CreateDirectory(artifactsDir);
var psi = new ProcessStartInfo
{
FileName = "dotnet",
WorkingDirectory = SolutionRoot,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
// dotnet run --project SampleBenchmarks -- --artifacts <dir> --exporters <...> --job <...> [--filter <...>]
ArgumentList = { "run", "--project", SampleProj, "-c", "Release", "--",
"--artifacts", artifactsDir,
"--exporters", req.Exporters,
"--job", req.Job }
};
if (!string.IsNullOrWhiteSpace(req.Filter))
{
psi.ArgumentList.Add("--filter");
psi.ArgumentList.Add(req.Filter!);
}
using var proc = Process.Start(psi)!;
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(req.TimeoutSec));
var stdoutTask = proc.StandardOutput.ReadToEndAsync(cts.Token);
var stderrTask = proc.StandardError.ReadToEndAsync(cts.Token);
if (!proc.WaitForExit((int)TimeSpan.FromSeconds(req.TimeoutSec).TotalMilliseconds))
{
try { proc.Kill(entireProcessTree: true); } catch { }
throw new TimeoutException($"Benchmark run timed out after {req.TimeoutSec}s");
}
var stdout = await stdoutTask;
var stderr = await stderrTask;
if (proc.ExitCode != 0)
throw new ApplicationException($"Benchmark process failed:\n{stderr}\n{stdout}");
var json = Directory.EnumerateFiles(artifactsDir, "*.json", SearchOption.AllDirectories)
.OrderByDescending(File.GetLastWriteTimeUtc)
.Select(File.ReadAllText)
.FirstOrDefault();
object? summary = null;
try { summary = json is null ? null : JsonSerializer.Deserialize<object>(json); } catch { }
return new RunBenchResponse(runId, artifactsDir, summary);
}
}
Two important choices here:
- Artifacts as the contract: the JSON/MD/CSV files are the API between my benchmark app and the MCP server. It’s robust and easy to inspect.
- Short summary shape: MCP tools should return compact payloads; clients can always fetch the full artifact if needed.
Listing artifacts and getting results
These are straightforward wrappers around Directory.EnumerateFiles
and a thin JSON parse returning the last exporter file. Keeping them separate lets clients tailor UI (e.g., open the Markdown report vs parse JSON).
[McpServerTool, Description("List artifact files for a given runId (JSON/MD/CSV/Log).")]
public static ListArtifactsResponse list_artifacts(ListArtifactsRequest req)
{
var dir = Path.Combine(RunsRoot, req.RunId);
if (!Directory.Exists(dir)) throw new DirectoryNotFoundException(dir);
var files = Directory.EnumerateFiles(dir, "*", SearchOption.AllDirectories).ToArray();
return new ListArtifactsResponse(req.RunId, files);
}
[McpServerTool, Description("Return parsed JSON summary for a given runId.")]
public static GetResultsResponse get_results(GetResultsRequest req)
{
var dir = Path.Combine(RunsRoot, req.RunId);
if (!Directory.Exists(dir)) throw new DirectoryNotFoundException(dir);
var jsonPath = Directory.EnumerateFiles(dir, "*.json", SearchOption.AllDirectories)
.OrderByDescending(File.GetLastWriteTimeUtc)
.FirstOrDefault();
object? obj = jsonPath is null ? null : JsonSerializer.Deserialize<object>(File.ReadAllText(jsonPath));
return new GetResultsResponse(req.RunId, obj);
}
Comparing two runs
This is where the MCP demo really “pops” in chat: I can ask, “compare run A to run B,” and get percentages per benchmark.
public record CompareRequest(
[Description("Base runId")] string BaseRunId,
[Description("Head runId")] string HeadRunId);
public record DiffRow(string Benchmark, double? MeanBase, double? MeanHead, double? MeanDeltaPct,
double? AllocBase, double? AllocHead, double? AllocDeltaPct);
public record CompareResponse(string BaseRunId, string HeadRunId, DiffRow[] Rows);
[McpServerTool, Description("Compare two runs and return deltas for common benchmarks (Mean, AllocatedBytes/Op).")]
public static CompareResponse compare_runs(CompareRequest req)
{
var baseJson = LoadJson(req.BaseRunId);
var headJson = LoadJson(req.HeadRunId);
var baseMap = IndexByTitle(baseJson);
var headMap = IndexByTitle(headJson);
var keys = baseMap.Keys.Intersect(headMap.Keys).OrderBy(k => k);
var rows = new List<DiffRow>();
foreach (var k in keys)
{
var b = baseMap[k];
var h = headMap[k];
double? meanB = TryGet(b, "Statistics", "Mean");
double? meanH = TryGet(h, "Statistics", "Mean");
double? meanPct = (meanB.HasValue && meanH.HasValue && meanB != 0)
? (meanH / meanB - 1.0) * 100.0 : null;
double? allocB = TryGet(b, "Memory", "AllocatedBytes/Op");
double? allocH = TryGet(h, "Memory", "AllocatedBytes/Op");
double? allocPct = (allocB.HasValue && allocH.HasValue && allocB != 0)
? (allocH / allocB - 1.0) * 100.0 : null;
rows.Add(new DiffRow(k, meanB, meanH, meanPct, allocB, allocH, allocPct));
}
return new CompareResponse(req.BaseRunId, req.HeadRunId, rows.ToArray());
}
The BenchmarkDotNet app (CLI)
I kept the benchmark app small and CLI‑first so the MCP server remains a thin orchestrator.
// SampleBenchmarks/Program.cs
using System.CommandLine;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Exporters.Csv;
using BenchmarkDotNet.Exporters.Json;
using BenchmarkDotNet.Loggers;
using BenchmarkDotNet.Running;
var artifacts = new Option<string>("--artifacts", () => "BenchmarkDotNet.Artifacts", "Artifacts output dir");
var exporters = new Option<string>("--exporters", () => "json,md", "json,fulljson,md,csv");
var filter = new Option<string?>("--filter", "Glob filter: e.g. *Hash*");
var job = new Option<string>("--job", () => "Medium", "Short|Medium|Long");
var root = new RootCommand("Sample Benchmarks");
root.AddOption(artifacts); root.AddOption(exporters); root.AddOption(filter); root.AddOption(job);
root.SetHandler(async (string art, string exp, string? f, string jobName) =>
{
var cfg = ManualConfig.CreateEmpty()
.AddLogger(ConsoleLogger.Default)
.WithArtifactsPath(art);
cfg = Jobs.ApplyPreset(cfg, jobName);
foreach (var e in exp.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
{
switch (e.ToLowerInvariant())
{
case "json": cfg.AddExporter(JsonExporter.Default); break;
case "fulljson": cfg.AddExporter(JsonExporter.Full); break;
case "md": cfg.AddExporter(MarkdownExporter.GitHub); break;
case "csv": cfg.AddExporter(CsvExporter.Default); break;
}
}
var switcher = new BenchmarkSwitcher(new[] { typeof(HashBench) });
await Task.Run(() => switcher.Run(
args: f is null ? Array.Empty<string>() : new[] { $"--filter={f}" }, config: cfg));
}, artifacts, exporters, filter, job);
return await root.InvokeAsync(args);
A tiny Jobs
helper keeps presets readable:
// Jobs.cs
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Jobs;
public static class Jobs
{
public static ManualConfig ApplyPreset(ManualConfig cfg, string jobName)
{
cfg.AddDiagnoser(MemoryDiagnoser.Default);
var job = jobName switch
{
"Short" => Job.ShortRun,
"Long" => Job.Default.WithIterationCount(20).WithWarmupCount(5),
_ => Job.MediumRun
};
return cfg.AddJob(job);
}
}
And a very visible, low‑noise benchmark to prove it works:
// HashBench.cs
using System.Security.Cryptography;
using System.Text;
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class HashBench
{
private readonly byte[] _data = Encoding.UTF8.GetBytes("The quick brown fox jumps over the lazy dog.");
[Benchmark] public byte[] Md5() => MD5.HashData(_data);
[Benchmark] public byte[] Sha256() => SHA256.HashData(_data);
}
Build, run, and sanity‑check
# 1) Build everything
dotnet restore
dotnet build
# 2) Sanity test the benchmark app directly (no MCP)
dotnet run --project src/SampleBenchmarks/SampleBenchmarks.csproj -c Release -- \
--artifacts runs/manual1 --exporters json,md --job Short --filter *Hash*
# 3) Start the MCP server
dotnet run --project src/Mcp.BenchmarkRunner.Server/Mcp.BenchmarkRunner.Server.csproj
If step 2 produces JSON/MD under runs/manual1
, you’re golden.
Using MCP Inspector (standalone UI)
I use Inspector to iterate on tool shapes quickly.
- Command:
dotnet
- Args:
run --project src/Mcp.BenchmarkRunner.Server/Mcp.BenchmarkRunner.Server.csproj
- Example payloads:
{ "filter": "*Hash*", "job": "Short", "exporters": "json,md" }
Then calllist_artifacts
andget_results
with the returnedrunId
.
Windows npm gotcha: if
npx @modelcontextprotocol/inspector
fails with anENOENT
for%APPDATA%\npm
, create the folder, setnpm config set prefix "%APPDATA%\npm"
, add it to PATH, and retry.
Using it inside Visual Studio (Copilot Chat Tools)
Create a .mcp.json
either in the solution root (preferable for relative paths) or at %USERPROFILE%\
with absolute paths.
Project‑local .mcp.json
{
"servers": {
"benchrunner": {
"name": "Benchmark Runner",
"type": "stdio",
"command": "dotnet",
"args": [
"run",
"--project",
"src/Mcp.BenchmarkRunner.Server/Mcp.BenchmarkRunner.Server.csproj"
]
}
}
}
Then in GitHub Copilot Chat → switch to Tools/Agent → enable Benchmark Runner → invoke:
run_bench
→ copyrunId
get_results
/list_artifacts
compare_runs
with tworunId
s to see mean/alloc deltas
Where to read logs: View → Output → GitHub Copilot.
Troubleshooting (things that bit me)
dotnet restore
fails: ensure thenuget.org
source is enabled, clear caches (dotnet nuget locals all --clear
), allow preview packages if your MCP SDK is prerelease, or temporarily switch TFM tonet8.0
.- Relative path chaos: launch contexts differ; walk up to repo root (see
FindRepoRoot()
) and compute absolute paths from there. - Inspector ENOENT on Windows: create
%APPDATA%\npm
, setnpm prefix
, add to PATH. - Timeouts: long
Long
jobs can exceed your tool timeout; maketimeoutSec
configurable and clamp within reason. - Artifacts missing: BDN writes to
BenchmarkDotNet.Artifacts
by default – ensure you pass--artifacts
and that the folder exists.
Design choices & lessons learned
- stdio transport is enough. Fancy transports can wait; the Inspector/Copilot combo already speaks stdio nicely.
- Artifacts as API makes debugging delightful. If a run fails, I can open the Markdown report or logs without touching the server code.
- Small surface area: tools do one thing each.
compare_runs
doesn’t pretend to be a dashboard; it just returns rows that any client can render. - Deterministic paths beat cleverness. Avoid “current directory” assumptions.
- DX helpers matter. Little tools like
diag_env
anddiag_build
remove guesswork when something fails on CI or a teammate’s machine.
FAQ: Building and Running the MCP Benchmark Server
No. Targeting net8.0
works fine; I only used APIs available in both.
Isolation and stability. A separate process keeps the server snappy and avoids choking on BDN’s heavy setup.
Add your types to the BenchmarkSwitcher
array and optionally extend the CLI with more switches (e.g., --disasm
for DisassemblyDiagnoser
).
Yes, but compare_runs
only matches benchmark names common to both runs.
Wrap the server with a simple script that calls run_bench
, uploads artifacts, and posts compare_runs
deltas as a PR comment.
I parse only a tiny subset (names, mean, alloc). If you need more, prefer the Full
JSON exporter and evolve your DTOs.
For local workflows, JSON/MD/CSV are fine. For massive suites, consider zipping artifacts per runId and streaming them.
Conclusion: Chat‑first performance checks for your .NET code
You don’t need a monolith dashboard to track performance. A tiny MCP server, a tiny BenchmarkDotNet app, and a few JSON files give you a chat‑native perf workflow you’ll actually use. Start by copying the host, wire up run_bench
, and prove it with HashBench
. Once that’s humming, add your real benchmarks and wire compare_runs
into your daily chat rituals.
What tool would you add next – upload_project
, disasm
, or a profile_hotpath
? Tell me in the comments and I’ll build the most requested one.
Useful links & repository
- Repository (public): Mcp.BenchmarkRunner
- BenchmarkDotNet docs: Overview • Exporters • GitHub
- Model Context Protocol (MCP): Official Documentation • Specification • Inspector (npm) • Sample servers
- ModelContextProtocol (NuGet): NuGet • GitHub