Are you sure you’re not rewriting the same AI plumbing for every project? In 2025, the .NET AI ecosystem matured fast – if you pick the right OSS blocks, you can ship real features this week, not next quarter.
I’ve spent the last year wiring chat, RAG, speech, and on‑device inference into production .NET apps. Below is a curated, hands‑on list of the top open‑source projects I keep reaching for. Each entry explains why it matters, when to use it, and gives you a drop‑in code snippet you can paste into a new repo right now.
TL;DR: Stop yak‑shaving SDK differences. Standardize on
Microsoft.Extensions.AI
for the façade, plug providers (OpenAI/Azure/Ollama/ONNX), and layer orchestration with Semantic Kernel. For local inference, use LLamaSharp or ONNX Runtime GenAI. For speech, Whisper.net. For memory, VectorData + a vector DB client.
How I picked (and what I optimized for)
- Active in 2024–2025 with real commits and releases.
- Developer ergonomics: clean APIs, DI‑friendly, async, streaming.
- Production viability: permissive license, docs, examples, issue velocity.
- Performance options: CPU/GPU/NPU paths, batching, streaming, or quantized models.
- Ecosystem fit: plays well with
Microsoft.Extensions.AI
, Semantic Kernel, and vector DBs.
Microsoft.Extensions.AI – the unifying façade for LLMs
When you’re juggling OpenAI for prod, Ollama for local dev, and Azure OpenAI for enterprise, the abstractions in Microsoft.Extensions.AI
let you swap providers behind a stable IChatClient
/IEmbeddingGenerator
surface. Middlewares (telemetry, caching, function/tool invocation) are DI‑friendly.
Use when: you want one code path that can talk to many models/providers.
# core abstractions + helpers
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.Extensions.AI.Ollama
Minimal chat (switch OpenAI ⇄ Ollama by one line):
using Microsoft.Extensions.AI;
using OpenAI; // for OpenAI client
// Option A: OpenAI (cloud)
IChatClient client =
new OpenAIClient(Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
.AsChatClient(modelId: "gpt-4o-mini");
// Option B: Ollama (local)
// IChatClient client = new OllamaChatClient(new Uri("http://localhost:11434/"), "llama3.1");
var response = await client.CompleteAsync(
"Summarize the main risk in our Q3 roadmap as a bullet list.");
Console.WriteLine(response.Message);
Tip: Treat IChatClient
as an interface boundary between your app and the model world. You can inject different providers per environment and keep business code pristine.
Semantic Kernel – orchestration, agents, plugins, and planning
I treat Semantic Kernel (SK) as my orchestration layer: tools/plugins, memory, multi‑agent patterns, and great samples. In 2025 it feels production‑ready and embraces both cloud and local models (OpenAI, Azure OpenAI, Ollama, ONNX, etc.).
Use when: you need tool‑calling, multi‑agent flows, or want a sane place to compose prompts + functions.
NuGet:
dotnet add package Microsoft.SemanticKernel
dotnet add package Microsoft.SemanticKernel.Agents.Core
Quickstart agent:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Agents;
using Microsoft.SemanticKernel.ChatCompletion;
var builder = Kernel.CreateBuilder();
// Swap in AddOpenAIChatCompletion or AddOllamaChatCompletion as needed
builder.AddAzureOpenAIChatCompletion(
Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT"),
Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT"),
Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY"));
var kernel = builder.Build();
var agent = new ChatCompletionAgent
{
Name = "SupportBot",
Instructions = "You are a concise assistant. Prefer bullet points.",
Kernel = kernel
};
await foreach (var item in agent.InvokeAsync("Draft a 3‑step incident response playbook."))
Console.WriteLine(item.Message);
Where it shines:
- “Bring your own vector store” via VectorData abstractions.
- Seamless tool definition (C# methods → callable functions).
- Multi‑agent threads for complex workflows.
Official OpenAI .NET – first‑party, batteries included
If you talk to the OpenAI API, use the official OpenAI
package. It’s open‑source, actively maintained, and maps the full surface (Chat, Responses, Embeddings, Audio, Images, Assistants, Vector Stores, Realtime…).
NuGet:
dotnet add package OpenAI
Straight‑to‑the‑point chat:
using OpenAI.Chat;
var client = new ChatClient(model: "gpt-4o-mini",
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
var completion = await client.CompleteChatAsync("Return a 15‑word project status update.");
Console.WriteLine(completion.Content[0].Text);
Why I like it: DI‑friendly, streaming APIs, tool/structured‑output support, and fewer surprises when endpoints change.
LLamaSharp – llama.cpp power from pure C#
LLamaSharp lets you run GGUF models locally (Llama, Mistral, Phi, etc.) with CPU/GPU acceleration, and it includes higher‑level chat/RAG helpers. Perfect for offline or cost‑sensitive scenarios.
NuGet:
dotnet add package LLamaSharp
dotnet add package LLamaSharp.Backend.Cpu # or .Cuda / .Metal depending on your machine
Load a local model and chat:
using LLama;
using LLama.Common;
var @params = new ModelParams("./models/llama-3.1-8b-instruct.Q4_K_M.gguf")
{
ContextSize = 4096,
GpuLayerCount = 0 // set >0 if you have GPU support enabled
};
using var model = LLamaWeights.LoadFromFile(@params);
using var ctx = model.CreateContext(@params);
var executor = new InteractiveExecutor(ctx);
var prompt = "You are a helpful assistant. Reply in 2 bullet points: reasons to add caching to RAG";
await foreach (var token in executor.InferAsync(prompt, new InferenceParams { Temperature = 0.7f }))
Console.Write(token);
When to reach for it: local development parity, privacy requirements, or when you need deterministic cost control.
ONNX Runtime + GenAI (C#) – on‑device LLM inference
ONNX Runtime has rock‑solid C# bindings for general ML inference and a GenAI API that wraps the full generate loop (tokenization, sampling, KV cache). This is my go‑to for running small/medium LLMs (Phi, TinyLlama, etc.) on Windows/Linux/macOS with GPU/DirectML where available.
NuGet:
dotnet add package Microsoft.ML.OnnxRuntime
dotnet add package Microsoft.ML.OnnxRuntimeGenAI
Minimal text generation with an ONNX LLM:
using Microsoft.ML.OnnxRuntimeGenAI;
var model = new Model("./models/phi3-mini.onnx");
var tokenizer = new Tokenizer(model);
var gen = new GeneratorParams(model);
gen.SetSearchOption("max_length", 128);
var prompt = tokenizer.Encode("Explain RAG to a junior developer in 3 bullets.");
gen.SetInputSequences(prompt);
var seq = model.Generate(gen);
Console.WriteLine(tokenizer.Decode(seq[0]));
Why it’s great: zero Python at runtime, fast startup, and a clean path to desktop apps or native AOT services.
Whisper.net – speech‑to‑text that just works in .NET
Whisper (via whisper.cpp) wrapped in a friendly C# API. I use this for meeting notes, voice chat, and multilingual support in MAUI/desktop apps.
NuGet:
dotnet add package Whisper.net.AllRuntimes
Transcribe a WAV/MP3 file:
using Whisper.net;
// Ensure you have a GGML model, e.g., ggml-base.bin in the working dir
using var factory = WhisperFactory.FromPath("ggml-base.bin");
using var processor = factory.CreateBuilder().WithLanguage("auto").Build();
await foreach (var result in processor.ProcessAsync(File.OpenRead("meeting.wav")))
Console.WriteLine($"{result.Start}->{result.End}: {result.Text}");
Nice extras: multiple runtimes (CPU, CUDA, CoreML, OpenVINO), streaming, and samples for MAUI/Blazor.
TorchSharp – PyTorch’s muscles for .NET
If you prefer tensor‑level control in C# (research, custom layers, or tiny models), TorchSharp mirrors libtorch APIs. I’ve used it for small classifiers and post‑processing rerankers that live next to my ASP.NET API.
NuGet:
dotnet add package TorchSharp
Tiny MLP (hello world):
using static TorchSharp.torch;
using TorchSharp.Tensor;
using TorchSharp.Modules;
var model = Sequential(
("fc1", nn.Linear(16, 32)),
("relu", nn.ReLU()),
("fc2", nn.Linear(32, 2))
);
var optim = optimizers.Adam(model.parameters(), lr:0.01);
var x = randn(new long[]{64,16});
var y = randint(0,2,new long[]{64});
for (int epoch=0; epoch<200; epoch++)
{
optim.zero_grad();
var logits = model.forward(x);
var loss = nn.functional.cross_entropy(logits, y);
loss.backward();
optim.step();
}
Where it helps: when you need custom math inside a .NET service without a Python sidecar.
ML.NET (+ Tokenizers) – classic ML meets modern AI
ML.NET
remains a great fit for traditional ML (classification/regression/NER with Model Builder) and for utilities like Microsoft.ML.Tokenizers
which saves you from ad‑hoc ports of GPT/BERT tokenizers.
NuGet:
dotnet add package Microsoft.ML
dotnet add package Microsoft.ML.Tokenizers
Tokenize with a GPT‑style tokenizer:
using Microsoft.ML.Tokenizers;
var tok = TiktokenTokenizer.CreateForModel("gpt-4o");
var ids = tok.Encode("Hello from .NET AI!");
Console.WriteLine(string.Join(",", ids));
When to use: quick tabular ML, light NLP, and tokenization or pre/post‑processing around your LLM flows.
Model Context Protocol (C# SDK) – first‑class tool integration
2025 is the year tool‑using AI stopped being vendor‑specific. MCP gives you a standard way to expose tools/resources to models. The C# SDKs make it trivial to stand up a server or connect a client from your .NET app.
NuGet (one option):
dotnet add package ModelContextProtocol --prerelease
Minimal MCP client that can use tools in chat:
using Microsoft.Extensions.AI;
using ModelContextProtocol.Client;
// Your preferred LLM via Extensions.AI (OpenAI/Azure/Ollama, etc.)
IChatClient client = /* build an IChatClient */;
// Start & connect to an MCP server (stdio transport shown)
var mcp = await McpClientFactory.CreateAsync(
new StdioClientTransport(new()
{
Command = "dotnet run",
Arguments = ["--project", "./tools/MyMcpServer"],
Name = "My MCP Server"
}));
// Fetch tool list and use them within chat
var tools = await mcp.ListToolsAsync();
await foreach (var update in client.GetStreamingResponseAsync(
new [] { new ChatMessage(ChatRole.User, "Use available tools to fetch my calendar") },
new() { Tools = [.. tools] }))
{
Console.Write(update);
}
Why care: standardized tool calling across providers, fewer bespoke integrations, and better portability.
Vector memory layer: Microsoft.Extensions.VectorData + vector DB clients
For RAG, you need both a clean abstraction and a battle‑tested store. I recommend combining Microsoft.Extensions.VectorData
(abstraction SK understands) with a native client for your store of choice:
- Qdrant .NET client:
QdrantClient
– great local/dev story and blazing filters. - Milvus C# SDK: solid at scale, cloud or on‑prem.
- Weaviate .NET (community): feature‑rich vector DB with hybrid search.
Skeleton: store & query embeddings (Qdrant example):
// 1) Get embeddings from your model
IEmbeddingGenerator<string, Embedding<float>> embedder = /* Extensions.AI OpenAI/Ollama embedder */;
var vector = (await embedder.GenerateAsync("RAG quickstart for .NET"))[0].Vector.ToArray();
// 2) Upsert into Qdrant
var qdrant = new QdrantClient("localhost");
await qdrant.CreateCollection("docs", 1536, distance: Distance.Cosine);
await qdrant.Upsert("docs", new[] { new PointStruct
{
Id = 1,
Vector = vector,
Payload = new() { ["title"] = "RAG quickstart" }
}});
// 3) Query
var result = await qdrant.Query("docs", vector, limit: 3);
foreach (var hit in result)
Console.WriteLine($"{hit.Id}: {hit.Score} -> {hit.Payload["title"]}");
OllamaSharp (or generated SDKs) – controlling local models via HTTP
If you prefer running local models via Ollama, OllamaSharp gives you a tidy .NET wrapper (streaming, embeddings, model pull/create). It’s a perfect dev companion and works great behind Microsoft.Extensions.AI
with an Ollama provider.
NuGet:
dotnet add package OllamaSharp
Stream tokens from a local Llama:
using OllamaSharp;
var ollama = new OllamaApiClient(new Uri("http://localhost:11434/"));
await foreach (var chunk in ollama.GenerateAsync("llama3.2", "Give me 3 performance tips for RAG"))
Console.Write(chunk.Response);
Honorable mentions
- LangChain.NET – composable LLM pipeline patterns if you like the LangChain style.
- Azure SDK for .NET (Azure.AI.OpenAI + Azure.AI.Inference) – production‑hardened clients under MIT, great auth; pairs well with Extensions.AI.
- Microsoft.SemanticKernel connectors & samples – tons of glue code you don’t want to write twice.
Architecture: how these pieces fit
┌────────────────────────────────────────────────────────┐
│ Your .NET Application │
│ (Web API, Worker, MAUI, Blazor, Function, Service) │
└───────────────┬─────────────────────────────┬──────────┘
│ │
Microsoft.Extensions.AI Semantic Kernel
IChatClient/IEmbedding (agents, tools,
+ middleware (logging, memory, planning)
caching, tool calling) │
│ │
┌──────────────┼──────────────┐ VectorData Abstractions
│ │ │ │
OpenAI Azure OpenAI Ollama Vector DB clients
(cloud) (cloud) (local) (Qdrant/Milvus/Weaviate)
│ │ │ │
└──────────────┴──────────────┴───────┬──────┘
│
Whisper.net (STT)
│
ONNX Runtime GenAI
/ LLamaSharp (LLM)
Key idea: keep providers swappable behind IChatClient
, and centralize orchestration (tools/memory) in SK. For local/offline, slide in LLamaSharp or ONNX GenAI. For RAG, add a vector store client through VectorData or native SDKs.
Real‑world starter combos
- Cloud‑first chatbot, local‑dev friendly:
Microsoft.Extensions.AI
+ OpenAI in prod, Ollama in dev; SK for tools; Qdrant for memory. - Offline desktop assistant: LLamaSharp or ONNX GenAI + Whisper.net; local Qdrant; no external calls.
- Enterprise “agentic” workflow: SK multi‑agent + MCP tools; Azure OpenAI; Milvus/Weaviate for RAG; telemetry middleware.
FAQ: Picking & using the 2025 .NET AI stack
LLamaSharp (via llama.cpp) is super convenient for GGUF models and quick local chats. ONNX GenAI shines when you want ONNX‑exported models, better GPU/NPU acceleration across platforms, or tighter control over the generation loop.
For a single chat endpoint, Microsoft.Extensions.AI
alone might be enough. Reach for SK when you add tool calling, RAG memory, planning, or multi‑agent flows.
Many are trending that way (Extensions.AI, VectorData, some community SDKs, and parts of ONNX/TorchSharp). Test your exact combo – reflection and dynamic JSON often need trimming hints.
Prefer Microsoft.ML.Tokenizers
in 2025. It’s actively supported and covers GPT/BPE/WordPiece scenarios without “mystery” assets.
Qdrant has a great .NET story and simple local dev. Milvus scales extremely well. Weaviate is flexible with hybrid search. All have OSS + cloud options – benchmark your data.
If you have multiple tools/data sources and expect to swap LLM providers, yes. MCP standardizes tool access so your app isn’t married to one vendor’s plugins.
Not usually. If your team already thinks in LangChain graphs, it’s fine, but SK + Extensions.AI covers most app patterns cleanly.
Conclusion: Your 2025 .NET AI stack, distilled
If you remember one thing, remember this: compose, don’t couple. Put Microsoft.Extensions.AI
at the boundary, orchestrate with Semantic Kernel, pick your inference (OpenAI cloud vs. LLamaSharp/ONNX local), and wire memory with VectorData + a solid vector DB client. That blueprint keeps you shippable today and portable tomorrow.
Which combo are you using (or planning) this quarter – and why? Drop your stack in the comments; I’ll suggest optimizations.
Links & further reading (official repos/docs)
- Microsoft.Extensions.AI: https://learn.microsoft.com/dotnet/ai/microsoft-extensions-ai
- Semantic Kernel (C#): https://github.com/microsoft/semantic-kernel
- OpenAI .NET SDK: https://github.com/openai/openai-dotnet
- LLamaSharp: https://github.com/SciSharp/LLamaSharp
- ONNX Runtime GenAI C# API: https://onnxruntime.ai/docs/genai/api/csharp.html
- Whisper.net: https://github.com/sandrohanea/whisper.net
- Model Context Protocol (C# SDK): https://github.com/modelcontextprotocol/csharp-sdk
- Vector DB clients:
- Qdrant https://github.com/qdrant/qdrant-dotnet
- Milvus https://github.com/milvus-io/milvus-sdk-csharp
- Weaviate (community) https://github.com/tryAGI/Weaviate
- Microsoft.ML.Tokenizers: https://www.nuget.org/packages/Microsoft.ML.Tokenizers
- OllamaSharp: https://github.com/awaescher/OllamaSharp