Semantic Kernel: One Client for OpenAI, Azure & Local LLMs

Connecting to OpenAI, Azure AI & Local Models with Semantic Kernel

Are you still juggling three different SDKs to talk to one LLM? Stop. With Semantic Kernel you can swap OpenAI, Azure AI, or a local model with (almost) a single line.

In real projects I often need to prototype with OpenAI, deploy to Azure AI for compliance, and keep a local fallback for offline demos or cost control. Years ago that meant different clients, different request/response shapes, and a stack of glue code. Today, Semantic Kernel (SK) gives you one abstraction over multiple providers, so your app code stays the same while the backend changes.

This post shows you, step by step, how to:

  • Wire up OpenAI, Azure AI (Azure OpenAI Service) and local OpenAI‑compatible runtimes (Ollama/LM Studio, etc.).
  • Use one prompt and one code path for all three.
  • Stream tokens, call functions/tools, and generate embeddings with a provider switch.
  • Hide secrets, DI‑register services, and troubleshoot the usual gotchas.

We’ll create a small .NET console app that selects a chat & embeddings provider from configuration/environment and just works.

What we’ll build (architecture in 30 seconds)

+-------------------------------------------+
| .NET app (services, prompts, plugins)     |
|   └── IChatCompletionService (SK)         |
|   └── ITextEmbeddingGenerationService (SK)|
+-------------------+-----------------------+
                    |
            Provider switch (config/env)
                    |
   +----------------+-------------+-------------------+
   |                              |                   |
OpenAI                         Azure AI           Local (OpenAI API
(api.openai.com)     (Azure OpenAI deployment)    compatible endpoint)

The only thing that changes is registration at startup.

Prerequisites

  • .NET 8 SDK or newer
  • A package feed that can restore NuGet packages
  • (Optional) OpenAI API key, Azure OpenAI resource with a deployed model, and/or a local runtime

NuGet packages

# Core + OpenAI connectors
 dotnet add package Microsoft.SemanticKernel
 dotnet add package Microsoft.SemanticKernel.Connectors.OpenAI

If you plan to use Azure Identity (Managed Identity/AAD) add:

dotnet add package Azure.Identity

Configuration first: one appsettings for all providers

appsettings.json

{
  "AI": {
    "Provider": "OpenAI", // OpenAI | AzureOpenAI | Local
    "OpenAI": {
      "Model": "gpt-4o-mini",
      "ApiKey": "${OPENAI_API_KEY}" // prefer env var substitution
    },
    "AzureOpenAI": {
      "Endpoint": "${AZURE_OPENAI_ENDPOINT}",
      "Deployment": "gpt-4o-mini",
      "ApiKey": "${AZURE_OPENAI_API_KEY}",
      "UseManagedIdentity": false
    },
    "Local": {
      "Model": "llama3.1:8b-instruct",
      "Endpoint": "http://localhost:1234/v1", // LM Studio default; Ollama often 11434
      "ApiKey": "not-needed"
    }
  }
}

I keep secrets in environment variables and map them in configuration (the ${VAR} placeholders are for illustration – use your own approach or a Secret Manager). The important part is AI.Provider.

Bootstrapping the Kernel with DI

Create a console app (or add to your existing host) and register SK services based on configuration.

using System.ClientModel; // if using Azure SDK creds
using Azure.Identity;     // optional
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Connectors.OpenAI;

var host = Host.CreateDefaultBuilder(args)
    .ConfigureAppConfiguration(cfg =>
    {
        cfg.AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
           .AddEnvironmentVariables();
    })
    .ConfigureServices((ctx, services) =>
    {
        var ai = ctx.Configuration.GetSection("AI");
        var provider = ai["Provider"]; // OpenAI | AzureOpenAI | Local

        var builder = Kernel.CreateBuilder();

        switch (provider)
        {
            case "OpenAI":
            {
                var model = ai.GetSection("OpenAI")["Model"]!;
                var key = ai.GetSection("OpenAI")["ApiKey"]!;
                builder.AddOpenAIChatCompletion(modelId: model, apiKey: key);
                builder.AddOpenAITextEmbeddingGeneration(modelId: "text-embedding-3-small", apiKey: key);
                break;
            }
            case "AzureOpenAI":
            {
                var section = ai.GetSection("AzureOpenAI");
                var endpoint = new Uri(section["Endpoint"]!);
                var deployment = section["Deployment"]!;
                var useMI = bool.TryParse(section["UseManagedIdentity"], out var b) && b;

                if (useMI)
                {
                    var credential = new DefaultAzureCredential();
                    builder.AddAzureOpenAIChatCompletion(
                        deploymentName: deployment,
                        endpoint: endpoint,
                        credentials: credential);
                    builder.AddAzureOpenAITextEmbeddingGeneration(
                        deploymentName: "text-embedding-3-small",
                        endpoint: endpoint,
                        credentials: credential);
                }
                else
                {
                    var apiKey = section["ApiKey"]!;
                    builder.AddAzureOpenAIChatCompletion(
                        deploymentName: deployment,
                        endpoint: endpoint,
                        apiKey: apiKey);
                    builder.AddAzureOpenAITextEmbeddingGeneration(
                        deploymentName: "text-embedding-3-small",
                        endpoint: endpoint,
                        apiKey: apiKey);
                }
                break;
            }
            case "Local":
            default:
            {
                // Any OpenAI-compatible runtime (LM Studio, vLLM, llama.cpp servers, Ollama w/ OpenAI adapter)
                var section = ai.GetSection("Local");
                var model = section["Model"]!;
                var endpoint = new Uri(section["Endpoint"]!);
                var apiKey = section["ApiKey"]!; // many runtimes ignore it; some require a placeholder

                builder.AddOpenAIChatCompletion(modelId: model, endpoint: endpoint, apiKey: apiKey);
                builder.AddOpenAITextEmbeddingGeneration(modelId: "text-embedding-3-small", endpoint: endpoint, apiKey: apiKey);
                break;
            }
        }

        services.AddSingleton(sp => builder.Build());
        services.AddSingleton(sp => sp.GetRequiredService<Kernel>().GetRequiredService<IChatCompletionService>());
        services.AddSingleton(sp => sp.GetRequiredService<Kernel>().GetRequiredService<ITextEmbeddingGenerationService>());
        services.AddHostedService<Demo>();
    })
    .Build();

await host.RunAsync();

// ---

class Demo : IHostedService
{
    private readonly Kernel _kernel;
    private readonly IChatCompletionService _chat;

    public Demo(Kernel kernel, IChatCompletionService chat)
    {
        _kernel = kernel; _chat = chat;
    }

    public async Task StartAsync(CancellationToken ct)
    {
        var chat = new ChatHistory();
        chat.AddSystemMessage("You are a concise assistant.");
        chat.AddUserMessage("Explain the Repository pattern in one paragraph.");

        var result = await _chat.GetChatMessageContentsAsync(chat, _kernel, cancellationToken: ct);
        Console.WriteLine(result[^1].Content);
    }

    public Task StopAsync(CancellationToken ct) => Task.CompletedTask;
}

Why this shape? I lean on Host so I can DI anything (HTTP clients, telemetry, vector stores) without refactoring later.

Tip: In production, favor Managed Identity on Azure instead of API keys. Keep both paths ready – your CI/CD can flip UseManagedIdentity per environment.

One prompt, three providers (no code changes)

If you change AI:Provider from OpenAI to AzureOpenAI to Local, the same Demo code runs against a different backend. That’s the magic: your business logic doesn’t care where the tokens come from.

Try it:

# OpenAI
setx OPENAI_API_KEY "sk-..."
dotnet run

# Azure AI
setx AZURE_OPENAI_ENDPOINT "https://<your>.openai.azure.com/"
setx AZURE_OPENAI_API_KEY "..."
powershell -c "[Environment]::SetEnvironmentVariable('AI__Provider','AzureOpenAI','Process')"
dotnet run

# Local (LM Studio example)
# Start LM Studio server (UI: enable 'OpenAI Compatible Server' on port 1234)
# Load a model: llama3.1 8B instruct (or any you like)
powershell -c "[Environment]::SetEnvironmentVariable('AI__Provider','Local','Process')"
dotnet run

Ollama? Start with ollama serve and ensure you expose an OpenAI‑compatible /v1 endpoint (some setups require a small adapter). Then point Local:Endpoint to it (often http://localhost:11434/v1).

Streaming tokens (because UX matters)

Plain responses are fine, but streaming feels immediate. SK exposes a streaming method; the exact signature may vary by minor versions, but a canonical pattern looks like this:

await foreach (var update in _chat.GetStreamingChatMessageContentsAsync(
                   new ChatHistory("You are terse.\nUser: list 5 SOLID principles; return bullets only."),
                   _kernel))
{
    Console.Write(update.Content);
}

Console.WriteLine();

This works regardless of provider – if the backend supports streaming, you’ll see tokens arrive as they’re generated.

Function calling / tools with SK plugins

Let’s wire a tiny weather tool to show SK’s function calling. We’ll let the model call GetWeatherAsync(city) when needed.

using Microsoft.SemanticKernel;

public class WeatherPlugin
{
    [KernelFunction, Description("Gets the current temperature for a city (mocked).")]
    public Task<string> GetWeatherAsync(
        [Description("City name, e.g., Sofia")] string city)
    {
        // In real life call your API here
        var rnd = new Random(city.GetHashCode());
        var temp = 18 + rnd.Next(-5, 10);
        return Task.FromResult($"{city}: {temp}°C and sunny");
    }
}

// Registration
builder.Plugins.AddFromObject(new WeatherPlugin(), pluginName: "weather");

Then, in your demo code, add a system prompt enabling tools and ask a question that requires the tool. SK will expose the function to the model using the appropriate tool-calling schema for the provider.

var chat = new ChatHistory();
chat.AddSystemMessage("You can call the 'weather.GetWeatherAsync' function when the user asks about weather.");
chat.AddUserMessage("Should I take a jacket in Sofia today?");

var response = await _chat.GetChatMessageContentsAsync(chat, _kernel);
Console.WriteLine(response[^1].Content);

Why it’s powerful: you keep the same plugin and same call site regardless of OpenAI/Azure/local. The tool‑call plumbing is handled by SK.

Embeddings with a provider switch

Embeddings back your RAG and semantic search. Register them alongside chat and keep the same interface.

public class EmbedDemo : IHostedService
{
    private readonly ITextEmbeddingGenerationService _emb;
    public EmbedDemo(ITextEmbeddingGenerationService emb) => _emb = emb;

    public async Task StartAsync(CancellationToken ct)
    {
        var vectors = await _emb.GenerateEmbeddingsAsync(new[]
        {
            "Unit testing is a design activity",
            "Repository pattern hides persistence concerns",
            "CQRS separates commands from queries"
        }, ct);

        Console.WriteLine($"Generated {vectors.Count} vectors, dim={vectors[0].Length}");
    }

    public Task StopAsync(CancellationToken ct) => Task.CompletedTask;
}

Heads‑up: model names differ between providers. For OpenAI you might use text-embedding-3-small; for Azure you deploy an embeddings model and reference the deployment name. For local servers, pick any hosted embeddings model and ensure it speaks the OpenAI embeddings API.

Prompt files & semantic functions (optional but handy)

I like to keep prompts versioned as files. SK lets you create “semantic functions” from prompt text and reuse them with any provider.

prompts/summarize.skprompt

{{$input}}
---
Summarize the above in 3 bullet points for a senior .NET developer.
Style: concise, technical.

Usage:

var summarize = _kernel.CreateFunctionFromPrompt(
    File.ReadAllText("prompts/summarize.skprompt"),
    new OpenAIPromptExecutionSettings { Temperature = 0.2 });

var output = await _kernel.InvokeAsync<string>(summarize, new() { ["input"] = longText });

The same prompt runs on OpenAI, Azure, or local because SK converts your intent into the provider’s format.

Local models: notes & pitfalls

  • Endpoints vary. LM Studio usually listens on http://localhost:1234/v1. Ollama’s native API is different, but community adapters and recent builds can expose an OpenAI‑compatible /v1 – double‑check your route.
  • Model names are arbitrary. For local servers, the model id is often whatever you loaded (e.g., llama3:8b). Use exactly that string in modelId.
  • Streaming quirks. Some local backends buffer or chunk differently. If your stream looks odd, test with curl first to isolate whether the issue is SK or the server.
  • CORS/Firewall. If you’re calling a local server from a Blazor WASM or SPA, you’ll need CORS. For server‑side .NET apps, ensure the port is reachable.

Azure AI specifics (a quick checklist)

  • Deployments, not models. In Azure OpenAI you reference a deployment name (your alias), not the raw model id. Keep the two straight.
  • Throttling (429). Respect rate limits; add retries with jitter. Azure quotas differ per region/deployment.
  • Identity. Prefer Managed Identity in App Service/AKS/Functions. In local dev, fall back to an API key.
  • Network rules. If your resource is behind a private endpoint, run your app inside the same network or use a proxy.

Error handling & observability

Wrap calls with structured logging and enrich exceptions with provider context. Example:

try
{
    var result = await _chat.GetChatMessageContentsAsync(chat, _kernel);
}
catch (HttpRequestException ex)
{
    // Surface provider, endpoint, model, request id if available
    Console.Error.WriteLine($"{ex.StatusCode} from {_chat.GetType().Name}: {ex.Message}");
    throw;
}

Consider logging tokens in/out, latency, and cache hits (if you add a response cache). In production I emit custom metrics per provider to spot regressions when switching backends.

From sample to real app: a minimal service layer

Abstract SK behind your own interface so the rest of your app never learns about providers:

public interface IAssistant
{
    Task<string> AskAsync(string userMessage, CancellationToken ct = default);
}

public class SkAssistant(Kernel kernel, IChatCompletionService chat) : IAssistant
{
    public async Task<string> AskAsync(string userMessage, CancellationToken ct = default)
    {
        var history = new ChatHistory();
        history.AddSystemMessage("You are a helpful .NET architect.");
        history.AddUserMessage(userMessage);
        var messages = await chat.GetChatMessageContentsAsync(history, kernel, ct);
        return messages[^1].Content ?? string.Empty;
    }
}

Now your web API/controller/UI only depends on IAssistant. At runtime, config decides whether that means OpenAI, Azure, or a local box under your desk.

Troubleshooting cheatsheet

  • 401/403: Wrong key/identity or hitting the wrong endpoint. For Azure, confirm resource endpoint https://<name>.openai.azure.com/ and the correct deployment name.
  • 404: Local server route mismatch (/v1/chat/completions vs /chat).
  • 429: Add retries with exponential backoff. Reduce concurrency or request max tokens.
  • Model not found: The model id (or Azure deployment) doesn’t exist or is in another region/resource.
  • Garbled streaming: Test with curl to confirm the server produces SSE or chunked responses as expected.

FAQ: Practical questions developers ask

Can I mix providers – e.g., OpenAI for chat but local for embeddings?

Absolutely. Register each service from a different provider. SK resolves by interface, so you can build a hybrid stack.

What if a provider doesn’t support a feature (e.g., tools, vision)?

SK tries to degrade gracefully, but you should feature‑detect. Keep optional paths and clear error messages.

How do I cache to save cost/latency?

Wrap your calls with an app‑level cache keyed on (prompt, settings, model). For RAG, cache embeddings per document hash.

Any guidance on model names?

Pin them via config. For Azure, prefer deployment aliases like gpt-4o-mini and rotate the underlying model without code changes.

Is it safe to log prompts/responses?

Not by default. Treat them as PII/secret‑adjacent. Scrub payloads or hash sensitive fields; follow your org’s compliance rules.

How do I unit test this?

Abstract IAssistant and inject a fake. For integration tests, spin up a local server (LM Studio) in CI and point AI:Provider at it.

Conclusion: One code path, any model

You don’t need three client libraries for three LLMs. With Semantic Kernel you write your prompts and plugins once, then decide at deploy time whether to use OpenAI, Azure AI, or a local runtime. The code above is production‑shaped – DI, config, and clean seams – so you can drop it into real services, stream tokens, call tools, and embed content without scattershot changes.

Your turn: which provider are you switching to first – OpenAI for speed, Azure for compliance, or local for cost control? Tell me how you plan to flip the switch in your project.

Leave a Reply

Your email address will not be published. Required fields are marked *