Speech-to-Text in Blazor: Real-Time Voice Input

Blazor Speech-to-Text: Real-Time Voice Input Guide

Learn how to add real-time speech-to-text to your Blazor app with JavaScript interop, Web Speech API, and a simple reusable component.

.NET Development·By amarozka · November 18, 2025

Blazor Speech-to-Text: Real-Time Voice Input Guide

Have you ever stared at a long text field in your app and thought: “Why am I still typing this like it’s 2005 when I could just talk?” In one of my Blazor projects, a client said exactly this. The result: we added speech-to-text, users loved it, and the support mailbox went quiet.

In this article you will build a real-time speech-to-text input for Blazor using JavaScript interop and the browser’s Web Speech API. No external services, no API keys, just the browser and a bit of glue code.

You will get:

  • A reusable Blazor component for voice input.
  • JavaScript helper that wraps the Web Speech API.
  • Real-time transcription with interim and final results.
  • Error handling and browser support checks.

By the end, you will be able to drop a <SpeechToText /> component into any form and let users fill fields by voice.

What we are going to build

We will build a simple component:

  • A text area that shows the recognized text.
  • A “Start” button to begin listening.
  • A “Stop” button to stop listening.
  • Live preview of interim results (words that may still change).
  • Status line: listening / idle / error message.

All the magic will happen in JavaScript, but Blazor will control it via JS interop, and you will get typed callbacks on the .NET side.

Why Web Speech API + JavaScript interop?

There are many options for speech-to-text:

  • Cloud APIs (Azure Cognitive Services, Google, etc.).
  • Native libraries wrapped with gRPC or REST.
  • Browser Web Speech API.

For a Blazor app that runs in the browser, the Web Speech API is the easiest place to start:

  • No server round trips for audio.
  • No external billing.
  • Works directly in the user’s browser.

The downside: not all browsers support it (mainly Chromium based ones do: Chrome, Edge, some versions of Opera). You will add a simple IsSupported check and show a friendly message when support is missing.

Create or reuse a Blazor project

You can use either Blazor Server or Blazor WebAssembly. The code below works with both.

Create a new project if you want to start from scratch:

dotnet new blazorserver -n BlazorSpeechDemo
cd BlazorSpeechDemo

or

dotnet new blazorwasm -n BlazorSpeechDemo
cd BlazorSpeechDemo

Run it once to be sure everything works:

dotnet run

You should see the default Blazor template page.

Add JavaScript helper for speech recognition

Create a JS file in wwwroot (for example wwwroot/js/speech-to-text.js).

Add this code:

// wwwroot/js/speech-to-text.js

window.speechToText = (function () {
    let recognition = null;
    let dotNetRef = null;

    function ensureRecognition() {
        const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

        if (!SpeechRecognition) {
            console.warn("Web Speech API is not supported in this browser.");
            return null;
        }

        if (!recognition) {
            recognition = new SpeechRecognition();
            recognition.continuous = true;
            recognition.interimResults = true;
            recognition.lang = "en-US";

            recognition.onresult = function (event) {
                let finalTranscript = "";
                let interimTranscript = "";

                for (let i = event.resultIndex; i < event.results.length; i++) {
                    const result = event.results[i];
                    const text = result[0].transcript;

                    if (result.isFinal) {
                        finalTranscript += text;
                    } else {
                        interimTranscript += text;
                    }
                }

                if (dotNetRef) {
                    dotNetRef.invokeMethodAsync("OnSpeechResult", finalTranscript, interimTranscript);
                }
            };

            recognition.onerror = function (event) {
                if (dotNetRef) {
                    dotNetRef.invokeMethodAsync("OnSpeechError", event.error || "unknown_error");
                }
            };

            recognition.onstart = function () {
                if (dotNetRef) {
                    dotNetRef.invokeMethodAsync("OnSpeechStatusChanged", true);
                }
            };

            recognition.onend = function () {
                if (dotNetRef) {
                    dotNetRef.invokeMethodAsync("OnSpeechStatusChanged", false);
                }
            };
        }

        return recognition;
    }

    return {
        init: function (ref) {
            dotNetRef = ref;
            ensureRecognition();
        },
        start: function () {
            const r = ensureRecognition();
            if (r) {
                try {
                    r.start();
                } catch (e) {
                    // Most browsers throw if start is called twice
                    console.warn("Failed to start recognition", e);
                }
            }
        },
        stop: function () {
            if (recognition) {
                recognition.stop();
            }
        },
        isSupported: function () {
            return !!(window.SpeechRecognition || window.webkitSpeechRecognition);
        }
    };
})();

What this script does:

  • Wraps SpeechRecognition / webkitSpeechRecognition into a single speechToText object.
  • Keeps a reference to a .NET object so it can call back into Blazor.
  • Sends three kinds of callbacks:
  • OnSpeechResult(finalTranscript, interimTranscript)
  • OnSpeechError(errorCode)
  • OnSpeechStatusChanged(isListening)

You will implement these methods on the Blazor side.

Register the script in your layout

Blazor Server: open Pages/_Host.cshtml and include the script before the closing </body> tag:

<script src="~/js/speech-to-text.js"></script>

Blazor WebAssembly: open wwwroot/index.html and add the same line before </body>:

<script src="js/speech-to-text.js"></script>

Now the window.speechToText object is available to your Blazor app.

Create a reusable Blazor speech-to-text component

Create a new Razor component Shared/SpeechToText.razor.

@using Microsoft.JSInterop
@implements IAsyncDisposable

<div class="card p-3 mb-3">
    <div class="d-flex gap-2 mb-2">
        <button class="btn btn-primary"
                @onclick="StartAsync"
                disabled="@(!IsSupported || IsListening)">
            🎙️ Start
        </button>

        <button class="btn btn-secondary"
                @onclick="StopAsync"
                disabled="@(!IsSupported || !IsListening)">
            ⏹ Stop
        </button>

        <span class="ms-2 align-self-center">
            @StatusMessage
        </span>
    </div>

    <div class="mb-2">
        <label class="form-label">Final text</label>
        <textarea class="form-control" rows="4" readonly>@FinalText</textarea>
    </div>

    <div>
        <label class="form-label">Interim text</label>
        <textarea class="form-control" rows="3" readonly>@InterimText</textarea>
    </div>
</div>

@code {
    [Inject] private IJSRuntime JS { get; set; } = default!;

    private DotNetObjectReference<SpeechToText>? _dotNetRef;

    public string FinalText { get; set; } = string.Empty;
    public string InterimText { get; set; } = string.Empty;

    public bool IsListening { get; set; }
    public bool IsSupported { get; set; }
    public string StatusMessage { get; set; } = "Initializing speech engine...";

    protected override async Task OnAfterRenderAsync(bool firstRender)
    {
        if (!firstRender)
            return;

        _dotNetRef = DotNetObjectReference.Create(this);

        await JS.InvokeVoidAsync("speechToText.init", _dotNetRef);

        IsSupported = await JS.InvokeAsync<bool>("speechToText.isSupported");

        StatusMessage = IsSupported
            ? "Ready for voice input."
            : "Speech recognition is not supported in this browser.";

        StateHasChanged();
    }

    private async Task StartAsync()
    {
        if (!IsSupported)
            return;

        StatusMessage = "Listening...";
        await JS.InvokeVoidAsync("speechToText.start");
    }

    private async Task StopAsync()
    {
        if (!IsSupported)
            return;

        StatusMessage = "Stopping...";
        await JS.InvokeVoidAsync("speechToText.stop");
    }

    [JSInvokable]
    public void OnSpeechResult(string finalTranscript, string interimTranscript)
    {
        if (!string.IsNullOrWhiteSpace(finalTranscript))
        {
            if (!string.IsNullOrEmpty(FinalText))
            {
                FinalText += " ";
            }

            FinalText += finalTranscript.Trim();
        }

        InterimText = interimTranscript;
        StatusMessage = IsListening ? "Listening..." : "Idle";

        InvokeAsync(StateHasChanged);
    }

    [JSInvokable]
    public void OnSpeechError(string error)
    {
        StatusMessage = $"Error: {error}";
        IsListening = false;
        InvokeAsync(StateHasChanged);
    }

    [JSInvokable]
    public void OnSpeechStatusChanged(bool isListening)
    {
        IsListening = isListening;
        StatusMessage = isListening ? "Listening..." : "Idle";
        InvokeAsync(StateHasChanged);
    }

    public async ValueTask DisposeAsync()
    {
        try
        {
            await JS.InvokeVoidAsync("speechToText.stop");
        }
        catch
        {
            // ignore
        }

        _dotNetRef?.Dispose();
    }
}

Now you have a standalone Blazor component that:

  • Initializes the JavaScript helper.
  • Checks browser support.
  • Starts / stops recognition on button clicks.
  • Receives text and status events from JS.

Note: the markup uses Bootstrap classes (btn, card, etc.) because they are already in the default Blazor template. You can swap them for your own CSS if you want.

Expose the transcription to the parent component

Right now the text is only stored inside the component. In real forms you want to bind it to a model so that voice and typing work together.

Let’s update the component to:

  • Accept a Value parameter.
  • Expose a ValueChanged callback.
  • Allow the parent to use bind-Value syntax.

Update SpeechToText.razor code block like this:

@code {
    [Inject] private IJSRuntime JS { get; set; } = default!;

    private DotNetObjectReference<SpeechToText>? _dotNetRef;

    [Parameter] public string Value { get; set; } = string.Empty;
    [Parameter] public EventCallback<string> ValueChanged { get; set; }

    public string FinalText { get; set; } = string.Empty;
    public string InterimText { get; set; } = string.Empty;

    public bool IsListening { get; set; }
    public bool IsSupported { get; set; }
    public string StatusMessage { get; set; } = "Initializing speech engine...";

    protected override async Task OnAfterRenderAsync(bool firstRender)
    {
        if (!firstRender)
            return;

        _dotNetRef = DotNetObjectReference.Create(this);

        await JS.InvokeVoidAsync("speechToText.init", _dotNetRef);

        IsSupported = await JS.InvokeAsync<bool>("speechToText.isSupported");

        StatusMessage = IsSupported
            ? "Ready for voice input."
            : "Speech recognition is not supported in this browser.";

        FinalText = Value;

        StateHasChanged();
    }

    private async Task StartAsync()
    {
        if (!IsSupported)
            return;

        StatusMessage = "Listening...";
        await JS.InvokeVoidAsync("speechToText.start");
    }

    private async Task StopAsync()
    {
        if (!IsSupported)
            return;

        StatusMessage = "Stopping...";
        await JS.InvokeVoidAsync("speechToText.stop");
    }

    [JSInvokable]
    public async Task OnSpeechResult(string finalTranscript, string interimTranscript)
    {
        if (!string.IsNullOrWhiteSpace(finalTranscript))
        {
            if (!string.IsNullOrEmpty(FinalText))
            {
                FinalText += " ";
            }

            FinalText += finalTranscript.Trim();
            Value = FinalText;
            await ValueChanged.InvokeAsync(Value);
        }

        InterimText = interimTranscript;
        StatusMessage = IsListening ? "Listening..." : "Idle";

        await InvokeAsync(StateHasChanged);
    }

    [JSInvokable]
    public Task OnSpeechError(string error)
    {
        StatusMessage = $"Error: {error}";
        IsListening = false;
        return InvokeAsync(StateHasChanged);
    }

    [JSInvokable]
    public Task OnSpeechStatusChanged(bool isListening)
    {
        IsListening = isListening;
        StatusMessage = isListening ? "Listening..." : "Idle";
        return InvokeAsync(StateHasChanged);
    }

    public async ValueTask DisposeAsync()
    {
        try
        {
            await JS.InvokeVoidAsync("speechToText.stop");
        }
        catch
        {
        }

        _dotNetRef?.Dispose();
    }
}

The key change is Value / ValueChanged support and updating them when final text changes.

Use the speech-to-text component in a form

Now you can plug this into any form. Let’s use Pages/Index.razor as a demo.

Replace the default content with:

@page "/"

<h3>Voice enabled feedback form</h3>

<div class="mb-3">
    <label class="form-label" for="nameInput">Your name</label>
    <input id="nameInput" class="form-control" @bind="Model.Name" />
</div>

<div class="mb-3">
    <label class="form-label">Your feedback (type or talk)</label>

    <SpeechToText @bind-Value="Model.Feedback" />

    <textarea class="form-control mt-2" rows="4" @bind="Model.Feedback"></textarea>
</div>

<button class="btn btn-success" @onclick="SubmitAsync">Send</button>

@if (LastSubmitted is not null)
{
    <div class="alert alert-info mt-3">
        <strong>Last submission:</strong>
        <pre>@LastSubmitted</pre>
    </div>
}

@code {
    public FeedbackModel Model { get; set; } = new();
    public string? LastSubmitted { get; set; }

    private Task SubmitAsync()
    {
        LastSubmitted = $"Name: {Model.Name}\nFeedback: {Model.Feedback}";
        return Task.CompletedTask;
    }

    public class FeedbackModel
    {
        public string Name { get; set; } = string.Empty;
        public string Feedback { get; set; } = string.Empty;
    }
}

Now users can:

  • Click Start in SpeechToText.
  • Talk.
  • Watch the interim text and final text.
  • Edit the final text manually in the bound textarea if needed.

In one of my apps, this pattern helped call center agents log call notes by speaking instead of typing everything, and it shaved seconds off every call.

Support multiple languages

You are not limited to en-US. The Web Speech API supports language codes like:

  • en-US – English (United States)
  • en-GB – English (United Kingdom)
  • de-DE – German
  • fr-FR – French
  • es-ES – Spanish

Let’s pass the language from Blazor to JavaScript.

First, update the JS file to accept language in start:

start: function (lang) {
    const r = ensureRecognition();
    if (r) {
        if (lang) {
            r.lang = lang;
        }
        try {
            r.start();
        } catch (e) {
            console.warn("Failed to start recognition", e);
        }
    }
},

Then update the Blazor component:

@code {
    [Parameter] public string Language { get; set; } = "en-US";

    // ... other members stay the same

    private async Task StartAsync()
    {
        if (!IsSupported)
            return;

        StatusMessage = $"Listening ({Language})...";
        await JS.InvokeVoidAsync("speechToText.start", Language);
    }
}

Now you can use:

<SpeechToText Language="de-DE" @bind-Value="Model.Feedback" />

and the browser will try to recognize German.

Handle common issues

In real projects, these questions usually appear during testing.

1. Browser support and fallbacks

Not all browsers support Web Speech API. Some tips:

  • Use IsSupported to show a clear message.
  • Hide or disable the Start button when support is missing.
  • Optionally show a hint: “Try Chrome or Edge for voice input”.

If you need support in all browsers, you will have to move to a cloud speech service and stream audio from the browser to your backend. The Blazor JS interop part would stay similar, but the JS part would capture audio and send it to your server.

2. HTTPS and microphone permissions

Most browsers require a secure context for microphone access:

  • Use HTTPS in production.
  • For local testing, https://localhost is treated as secure.

The first time you call start(), the browser will ask for microphone permission. If users block it, the API will fail. You can:

  • Catch errors in onerror and show instructions.
  • Add a short help text near the Start button.

3. Handling pauses and auto stop

The Web Speech API may stop after a long pause. You will see onend fired and IsListening set to false.

If you want “push to talk” behavior only, the default is fine: user clicks Start, says a sentence, it stops.

If you want long dictation mode:

  • In OnSpeechStatusChanged, when isListening becomes false, you can auto-call start again.
  • Or you can show a small hint: “Click Start again to continue dictation”.

4. Cleaning up resource leaks

Without proper disposal, your component might keep JS references alive. The DisposeAsync method already stops recognition and disposes the DotNetObjectReference. Keep that pattern when you move the code into a library.

5. Logging and diagnostics

In real apps I usually:

  • Log OnSpeechError with a structured logger.
  • Count how many users use voice input vs typing.
  • Track average speech session length.

This helps decide if you should invest more time into better models (for example, switching to a cloud service later) or if the built in browser recognition is enough.

FAQ: Speech-to-text in Blazor

Does this work in all browsers?

No. The Web Speech API is mainly supported in Chromium based browsers (Chrome, Edge). On Firefox and Safari support is limited or missing. That is why you added isSupported() and disabled the buttons when it returns false.

Can I send the audio to my server instead?

Yes, but that is a bigger setup:
– Use getUserMedia to capture audio.
– Stream it to your backend over WebSocket or gRPC.
– Use a speech-to-text service on the server.
The Blazor integration pattern would be similar, but the JS logic would handle audio streaming instead of calling SpeechRecognition.

Is speech recognition done locally or in the cloud?

It depends on the browser implementation. From the Blazor side you do not control this. The Web Speech API hides the details and just returns text events.

How can I mix typing and speech nicely?

Use @bind-Value like in the feedback form example. The user can:
– Start with voice input.
– Fix small errors by typing.
– Add extra notes later.
In the component you can treat the voice result as just another way to change the bound string.

Can I limit speech length or word count?

Yes. In OnSpeechResult you can check the length of FinalText and stop recognition when it reaches your limit. You can also show a note like “You reached the maximum text length”.

Conclusion: Voice input for Blazor in a few steps

You have added real-time speech-to-text to a Blazor app using only:

  • A small JavaScript helper around SpeechRecognition.
  • A reusable Blazor component to drive it.
  • A simple binding pattern to plug it into any form.

From a user point of view, this is a small feature that makes your app feel more modern and friendly, especially on mobile or for long forms.

I recommend you pick one boring form in your current project (support ticket, feedback, long notes) and add this component there. Watch how your users react and adjust the UX based on real feedback.

Have you already tried voice input in a Blazor app, or is this your first attempt? Share your experience, questions, or code tweaks in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *