Are you still juggling three different SDKs to talk to one LLM? Stop. With Semantic Kernel you can swap OpenAI, Azure AI, or a local model with (almost) a single line.
In real projects I often need to prototype with OpenAI, deploy to Azure AI for compliance, and keep a local fallback for offline demos or cost control. Years ago that meant different clients, different request/response shapes, and a stack of glue code. Today, Semantic Kernel (SK) gives you one abstraction over multiple providers, so your app code stays the same while the backend changes.
This post shows you, step by step, how to:
- Wire up OpenAI, Azure AI (Azure OpenAI Service) and local OpenAI‑compatible runtimes (Ollama/LM Studio, etc.).
- Use one prompt and one code path for all three.
- Stream tokens, call functions/tools, and generate embeddings with a provider switch.
- Hide secrets, DI‑register services, and troubleshoot the usual gotchas.
We’ll create a small .NET console app that selects a chat & embeddings provider from configuration/environment and just works.
What we’ll build (architecture in 30 seconds)
+-------------------------------------------+
| .NET app (services, prompts, plugins) |
| └── IChatCompletionService (SK) |
| └── ITextEmbeddingGenerationService (SK)|
+-------------------+-----------------------+
|
Provider switch (config/env)
|
+----------------+-------------+-------------------+
| | |
OpenAI Azure AI Local (OpenAI API
(api.openai.com) (Azure OpenAI deployment) compatible endpoint)
The only thing that changes is registration at startup.
Prerequisites
- .NET 8 SDK or newer
- A package feed that can restore NuGet packages
- (Optional) OpenAI API key, Azure OpenAI resource with a deployed model, and/or a local runtime
NuGet packages
# Core + OpenAI connectors
dotnet add package Microsoft.SemanticKernel
dotnet add package Microsoft.SemanticKernel.Connectors.OpenAI
If you plan to use Azure Identity (Managed Identity/AAD) add:
dotnet add package Azure.Identity
Configuration first: one appsettings for all providers
appsettings.json
{
"AI": {
"Provider": "OpenAI", // OpenAI | AzureOpenAI | Local
"OpenAI": {
"Model": "gpt-4o-mini",
"ApiKey": "${OPENAI_API_KEY}" // prefer env var substitution
},
"AzureOpenAI": {
"Endpoint": "${AZURE_OPENAI_ENDPOINT}",
"Deployment": "gpt-4o-mini",
"ApiKey": "${AZURE_OPENAI_API_KEY}",
"UseManagedIdentity": false
},
"Local": {
"Model": "llama3.1:8b-instruct",
"Endpoint": "http://localhost:1234/v1", // LM Studio default; Ollama often 11434
"ApiKey": "not-needed"
}
}
}
I keep secrets in environment variables and map them in configuration (the ${VAR}
placeholders are for illustration – use your own approach or a Secret Manager). The important part is AI.Provider
.
Bootstrapping the Kernel with DI
Create a console app (or add to your existing host) and register SK services based on configuration.
using System.ClientModel; // if using Azure SDK creds
using Azure.Identity; // optional
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Connectors.OpenAI;
var host = Host.CreateDefaultBuilder(args)
.ConfigureAppConfiguration(cfg =>
{
cfg.AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
.AddEnvironmentVariables();
})
.ConfigureServices((ctx, services) =>
{
var ai = ctx.Configuration.GetSection("AI");
var provider = ai["Provider"]; // OpenAI | AzureOpenAI | Local
var builder = Kernel.CreateBuilder();
switch (provider)
{
case "OpenAI":
{
var model = ai.GetSection("OpenAI")["Model"]!;
var key = ai.GetSection("OpenAI")["ApiKey"]!;
builder.AddOpenAIChatCompletion(modelId: model, apiKey: key);
builder.AddOpenAITextEmbeddingGeneration(modelId: "text-embedding-3-small", apiKey: key);
break;
}
case "AzureOpenAI":
{
var section = ai.GetSection("AzureOpenAI");
var endpoint = new Uri(section["Endpoint"]!);
var deployment = section["Deployment"]!;
var useMI = bool.TryParse(section["UseManagedIdentity"], out var b) && b;
if (useMI)
{
var credential = new DefaultAzureCredential();
builder.AddAzureOpenAIChatCompletion(
deploymentName: deployment,
endpoint: endpoint,
credentials: credential);
builder.AddAzureOpenAITextEmbeddingGeneration(
deploymentName: "text-embedding-3-small",
endpoint: endpoint,
credentials: credential);
}
else
{
var apiKey = section["ApiKey"]!;
builder.AddAzureOpenAIChatCompletion(
deploymentName: deployment,
endpoint: endpoint,
apiKey: apiKey);
builder.AddAzureOpenAITextEmbeddingGeneration(
deploymentName: "text-embedding-3-small",
endpoint: endpoint,
apiKey: apiKey);
}
break;
}
case "Local":
default:
{
// Any OpenAI-compatible runtime (LM Studio, vLLM, llama.cpp servers, Ollama w/ OpenAI adapter)
var section = ai.GetSection("Local");
var model = section["Model"]!;
var endpoint = new Uri(section["Endpoint"]!);
var apiKey = section["ApiKey"]!; // many runtimes ignore it; some require a placeholder
builder.AddOpenAIChatCompletion(modelId: model, endpoint: endpoint, apiKey: apiKey);
builder.AddOpenAITextEmbeddingGeneration(modelId: "text-embedding-3-small", endpoint: endpoint, apiKey: apiKey);
break;
}
}
services.AddSingleton(sp => builder.Build());
services.AddSingleton(sp => sp.GetRequiredService<Kernel>().GetRequiredService<IChatCompletionService>());
services.AddSingleton(sp => sp.GetRequiredService<Kernel>().GetRequiredService<ITextEmbeddingGenerationService>());
services.AddHostedService<Demo>();
})
.Build();
await host.RunAsync();
// ---
class Demo : IHostedService
{
private readonly Kernel _kernel;
private readonly IChatCompletionService _chat;
public Demo(Kernel kernel, IChatCompletionService chat)
{
_kernel = kernel; _chat = chat;
}
public async Task StartAsync(CancellationToken ct)
{
var chat = new ChatHistory();
chat.AddSystemMessage("You are a concise assistant.");
chat.AddUserMessage("Explain the Repository pattern in one paragraph.");
var result = await _chat.GetChatMessageContentsAsync(chat, _kernel, cancellationToken: ct);
Console.WriteLine(result[^1].Content);
}
public Task StopAsync(CancellationToken ct) => Task.CompletedTask;
}
Why this shape? I lean on Host
so I can DI anything (HTTP clients, telemetry, vector stores) without refactoring later.
Tip: In production, favor Managed Identity on Azure instead of API keys. Keep both paths ready – your CI/CD can flip
UseManagedIdentity
per environment.
One prompt, three providers (no code changes)
If you change AI:Provider
from OpenAI
to AzureOpenAI
to Local
, the same Demo
code runs against a different backend. That’s the magic: your business logic doesn’t care where the tokens come from.
Try it:
# OpenAI
setx OPENAI_API_KEY "sk-..."
dotnet run
# Azure AI
setx AZURE_OPENAI_ENDPOINT "https://<your>.openai.azure.com/"
setx AZURE_OPENAI_API_KEY "..."
powershell -c "[Environment]::SetEnvironmentVariable('AI__Provider','AzureOpenAI','Process')"
dotnet run
# Local (LM Studio example)
# Start LM Studio server (UI: enable 'OpenAI Compatible Server' on port 1234)
# Load a model: llama3.1 8B instruct (or any you like)
powershell -c "[Environment]::SetEnvironmentVariable('AI__Provider','Local','Process')"
dotnet run
Ollama? Start with
ollama serve
and ensure you expose an OpenAI‑compatible/v1
endpoint (some setups require a small adapter). Then pointLocal:Endpoint
to it (oftenhttp://localhost:11434/v1
).
Streaming tokens (because UX matters)
Plain responses are fine, but streaming feels immediate. SK exposes a streaming method; the exact signature may vary by minor versions, but a canonical pattern looks like this:
await foreach (var update in _chat.GetStreamingChatMessageContentsAsync(
new ChatHistory("You are terse.\nUser: list 5 SOLID principles; return bullets only."),
_kernel))
{
Console.Write(update.Content);
}
Console.WriteLine();
This works regardless of provider – if the backend supports streaming, you’ll see tokens arrive as they’re generated.
Function calling / tools with SK plugins
Let’s wire a tiny weather tool to show SK’s function calling. We’ll let the model call GetWeatherAsync(city)
when needed.
using Microsoft.SemanticKernel;
public class WeatherPlugin
{
[KernelFunction, Description("Gets the current temperature for a city (mocked).")]
public Task<string> GetWeatherAsync(
[Description("City name, e.g., Sofia")] string city)
{
// In real life call your API here
var rnd = new Random(city.GetHashCode());
var temp = 18 + rnd.Next(-5, 10);
return Task.FromResult($"{city}: {temp}°C and sunny");
}
}
// Registration
builder.Plugins.AddFromObject(new WeatherPlugin(), pluginName: "weather");
Then, in your demo code, add a system prompt enabling tools and ask a question that requires the tool. SK will expose the function to the model using the appropriate tool-calling schema for the provider.
var chat = new ChatHistory();
chat.AddSystemMessage("You can call the 'weather.GetWeatherAsync' function when the user asks about weather.");
chat.AddUserMessage("Should I take a jacket in Sofia today?");
var response = await _chat.GetChatMessageContentsAsync(chat, _kernel);
Console.WriteLine(response[^1].Content);
Why it’s powerful: you keep the same plugin and same call site regardless of OpenAI/Azure/local. The tool‑call plumbing is handled by SK.
Embeddings with a provider switch
Embeddings back your RAG and semantic search. Register them alongside chat and keep the same interface.
public class EmbedDemo : IHostedService
{
private readonly ITextEmbeddingGenerationService _emb;
public EmbedDemo(ITextEmbeddingGenerationService emb) => _emb = emb;
public async Task StartAsync(CancellationToken ct)
{
var vectors = await _emb.GenerateEmbeddingsAsync(new[]
{
"Unit testing is a design activity",
"Repository pattern hides persistence concerns",
"CQRS separates commands from queries"
}, ct);
Console.WriteLine($"Generated {vectors.Count} vectors, dim={vectors[0].Length}");
}
public Task StopAsync(CancellationToken ct) => Task.CompletedTask;
}
Heads‑up: model names differ between providers. For OpenAI you might use
text-embedding-3-small
; for Azure you deploy an embeddings model and reference the deployment name. For local servers, pick any hosted embeddings model and ensure it speaks the OpenAI embeddings API.
Prompt files & semantic functions (optional but handy)
I like to keep prompts versioned as files. SK lets you create “semantic functions” from prompt text and reuse them with any provider.
prompts/summarize.skprompt
{{$input}}
---
Summarize the above in 3 bullet points for a senior .NET developer.
Style: concise, technical.
Usage:
var summarize = _kernel.CreateFunctionFromPrompt(
File.ReadAllText("prompts/summarize.skprompt"),
new OpenAIPromptExecutionSettings { Temperature = 0.2 });
var output = await _kernel.InvokeAsync<string>(summarize, new() { ["input"] = longText });
The same prompt runs on OpenAI, Azure, or local because SK converts your intent into the provider’s format.
Local models: notes & pitfalls
- Endpoints vary. LM Studio usually listens on
http://localhost:1234/v1
. Ollama’s native API is different, but community adapters and recent builds can expose an OpenAI‑compatible/v1
– double‑check your route. - Model names are arbitrary. For local servers, the model id is often whatever you loaded (e.g.,
llama3:8b
). Use exactly that string inmodelId
. - Streaming quirks. Some local backends buffer or chunk differently. If your stream looks odd, test with
curl
first to isolate whether the issue is SK or the server. - CORS/Firewall. If you’re calling a local server from a Blazor WASM or SPA, you’ll need CORS. For server‑side .NET apps, ensure the port is reachable.
Azure AI specifics (a quick checklist)
- Deployments, not models. In Azure OpenAI you reference a deployment name (your alias), not the raw model id. Keep the two straight.
- Throttling (429). Respect rate limits; add retries with jitter. Azure quotas differ per region/deployment.
- Identity. Prefer Managed Identity in App Service/AKS/Functions. In local dev, fall back to an API key.
- Network rules. If your resource is behind a private endpoint, run your app inside the same network or use a proxy.
Error handling & observability
Wrap calls with structured logging and enrich exceptions with provider context. Example:
try
{
var result = await _chat.GetChatMessageContentsAsync(chat, _kernel);
}
catch (HttpRequestException ex)
{
// Surface provider, endpoint, model, request id if available
Console.Error.WriteLine($"{ex.StatusCode} from {_chat.GetType().Name}: {ex.Message}");
throw;
}
Consider logging tokens in/out, latency, and cache hits (if you add a response cache). In production I emit custom metrics per provider to spot regressions when switching backends.
From sample to real app: a minimal service layer
Abstract SK behind your own interface so the rest of your app never learns about providers:
public interface IAssistant
{
Task<string> AskAsync(string userMessage, CancellationToken ct = default);
}
public class SkAssistant(Kernel kernel, IChatCompletionService chat) : IAssistant
{
public async Task<string> AskAsync(string userMessage, CancellationToken ct = default)
{
var history = new ChatHistory();
history.AddSystemMessage("You are a helpful .NET architect.");
history.AddUserMessage(userMessage);
var messages = await chat.GetChatMessageContentsAsync(history, kernel, ct);
return messages[^1].Content ?? string.Empty;
}
}
Now your web API/controller/UI only depends on IAssistant
. At runtime, config decides whether that means OpenAI, Azure, or a local box under your desk.
Troubleshooting cheatsheet
- 401/403: Wrong key/identity or hitting the wrong endpoint. For Azure, confirm resource endpoint
https://<name>.openai.azure.com/
and the correct deployment name. - 404: Local server route mismatch (
/v1/chat/completions
vs/chat
). - 429: Add retries with exponential backoff. Reduce concurrency or request max tokens.
- Model not found: The model id (or Azure deployment) doesn’t exist or is in another region/resource.
- Garbled streaming: Test with
curl
to confirm the server produces SSE or chunked responses as expected.
FAQ: Practical questions developers ask
Absolutely. Register each service from a different provider. SK resolves by interface, so you can build a hybrid stack.
SK tries to degrade gracefully, but you should feature‑detect. Keep optional paths and clear error messages.
Wrap your calls with an app‑level cache keyed on (prompt, settings, model)
. For RAG, cache embeddings per document hash.
Pin them via config. For Azure, prefer deployment aliases like gpt-4o-mini
and rotate the underlying model without code changes.
Not by default. Treat them as PII/secret‑adjacent. Scrub payloads or hash sensitive fields; follow your org’s compliance rules.
Abstract IAssistant
and inject a fake. For integration tests, spin up a local server (LM Studio) in CI and point AI:Provider
at it.
Conclusion: One code path, any model
You don’t need three client libraries for three LLMs. With Semantic Kernel you write your prompts and plugins once, then decide at deploy time whether to use OpenAI, Azure AI, or a local runtime. The code above is production‑shaped – DI, config, and clean seams – so you can drop it into real services, stream tokens, call tools, and embed content without scattershot changes.
Your turn: which provider are you switching to first – OpenAI for speed, Azure for compliance, or local for cost control? Tell me how you plan to flip the switch in your project.