Speech To Text -

Prerequisites

Azure Subscription
Visual Studio 2019
Basic knowledge of WPF
Microphone capable of audio capture

Description

Speech to text also known as speech recognition is the real-time transcription of audio streams into text. Some use cases include converting the audio from videos into fully searchable text, voice driven medical report generation, and transcribing meetings and conference calls.

In this post I will build a basic WPF desktop application that converts continous microphone audio to text. It will use Speech To Text service which is part of Microsoft Cognitive Services on Azure. This service is powered by the same recognition technology used for Cortana and Office products. It seems to work really well and is capable of working with multiple languages. For a full list of available languages, see supported languages.

Create An Azure Speech Resource

Log into the Azure portal and Create a New Resource. Type ‘Speech’ in the search box and hit ENTER.

Click Create. Fill out the form with the following details.

Give your speech resource a name
Select a subscription
Select a location
Select a pricing tier (currently this service is in public preview, there are two pricing tiers F0 and S0, choose F0 for the Free plan)
Select a resource group

Click Create, This will take you to the deployment overview.
It takes a few moments to deploy your new Speech resource. Once deployed, select Go to resource.

For more details, please visit the official documentation - Create a Speech Resource in Azure

Speech Eervice Subscription Keys

In the left navigation pane select Keys to display your Speech service subscription keys. Each subscription has two keys; you can use either key in your application.

Integrate With The App

Open Visual Studio 2019 and Create an new project using the template for WPF App (.NET Core).

NuGet Package

Install the NuGet package for Microsoft.CognitiveServices.Speech

<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.9.0" />

View Model Class

In your project, add a folder called Viewmodel and in it create a new file called MainVM.cs. Replace the existing class with the following. Remember to add using System.ComponentModel; These are the properties we will bind to in the Mainwindow.

public class MainVM : INotifyPropertyChanged
{
    /// <summary>
    /// Intermediate recognition result
    /// </summary>
    private string _speech = "";
    public string Speech
    {
        get { return _speech; }
        set
        {
            _speech = value;
            OnPropertyChanged("Speech");
        }
    }

    /// <summary>
    /// The transcribed audio
    /// </summary>
    private string _transcribed = "";
    public string Transcribed
    {
        get { return _transcribed; }
        set
        {
            _transcribed = value;
            OnPropertyChanged("Transcribed");
        }
    }

    /// <summary>
    /// Indicates if the session is started or stopped
    /// </summary>
    private string _startstop = "Start";
    public string StartStop
    {
        get { return _startstop; }
        set
        {
            _startstop = value;
            OnPropertyChanged("StartStop");
        }
    }

    /// <summary>
    /// Status or error messages
    /// </summary>
    private string _status = "";
    public string Status
    {
        get { return _status; }
        set
        {
            _status = value;
            OnPropertyChanged("Status");
        }
    }

    #region INotifyPropertyChanged Members  
    public event PropertyChangedEventHandler PropertyChanged;
    private void OnPropertyChanged(string propertyName)
    {
        if (PropertyChanged != null)
        {
            PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        }
    }
    #endregion
}

Mainwindow

Open the Mainwindow.xaml file and replace the existing <Grid> with the following.

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="Auto"></RowDefinition>
        <RowDefinition Height="Auto"></RowDefinition>
        <RowDefinition Height="Auto"></RowDefinition>
        <RowDefinition Height="*"></RowDefinition>
        <RowDefinition Height="Auto"></RowDefinition>
        <RowDefinition Height="Auto"></RowDefinition>
    </Grid.RowDefinitions>
    
    <TextBlock Grid.Row="0" Margin="10,30,0,0" TextWrapping="Wrap">Intermeditate Result</TextBlock>
    <Border Grid.Row="1"  Margin="10,0,10,10" BorderThickness="1" BorderBrush="LightGray">
        <TextBlock   Text="{Binding Speech}"></TextBlock>
    </Border>

    <TextBlock Grid.Row="2" Margin="10,10,0,0" TextWrapping="Wrap">Final Result</TextBlock>
    <Border Grid.Row="3"  Margin="10,0,10,10" BorderThickness="1" BorderBrush="LightGray">
        <TextBlock     Background="#FFEDEDF4" FontSize="14" TextWrapping="Wrap"  Text="{Binding Transcribed}"></TextBlock>
    </Border>
    <DockPanel Grid.Row="4"  Margin="10,0,10,0">
        <TextBlock>Status</TextBlock>
        <TextBox Margin="10,0,0,0" Text="{Binding Status}" Background="#FFFCFFE9"></TextBox>
    </DockPanel>
    <StackPanel Grid.Row="5" Margin="0,10,0,0" Orientation="Horizontal" HorizontalAlignment="Center">
        <Button  Background="White" Margin="0,0,20,15" Width="100" Click="Button_Click" Content="{Binding StartStop, Mode=TwoWay}"></Button>
        <Button  Background="White" Margin="0,0,20,15" Width="100" Content="Clear" Click="ClearText"></Button>
    </StackPanel>
</Grid>

Open the code behind file, Mainwindow.cs and add the following components

Properties to hold Subscription key and service region from Azure

string subscriptionKey = "<replace with your speech service subscription key>";
string serviceRegion = "<replace with service region>";

Speech config and recognizer

SpeechConfig config;
SpeechRecognizer recognizer;

Reference to a new view model instance

MainVM _mainVM = new MainVM();

Stop recongnition task

TaskCompletionSource<int> stopRecognition;

In the constructor we need to set our DataContext to the view model _mainVM, create instances of the SpeechRecognizer and SpeechConfig, and subscribe to the necessary events.

public MainWindow()
{
    InitializeComponent();
    DataContext = _mainVM;

    // Creates an instance of a speech config with specified subscription key and service region.
    // Replace with your own subscription key // and service region (e.g., "westus").
    config = SpeechConfig.FromSubscription(subscriptionKey, serviceRegion);

    recognizer = new SpeechRecognizer(config);

    //subscribe to events
    recognizer.Recognizing += Recognizer_Recognizing;
    recognizer.Recognized += Recognizer_Recognized;
    recognizer.Canceled += Recognizer_Canceled;
    recognizer.SessionStarted += Recognizer_SessionStarted;
    recognizer.SessionStopped += Recognizer_SessionStopped;

    Closing += MainWindow_Closing;
}

Recognizer event handlers

private void Recognizer_SessionStopped(object sender, SessionEventArgs e)
{
    _mainVM.Status = "Speech recognition session stopped";
    stopRecognition.TrySetResult(0);
}

private void Recognizer_SessionStarted(object sender, SessionEventArgs e)
{
    _mainVM.Status = "Speech recognition session started...";
}

private void Recognizer_Canceled(object sender, SpeechRecognitionCanceledEventArgs e)
{
    _mainVM.Status = ($"CANCELED: Reason={e.Reason}");
    if (e.Reason == CancellationReason.Error)
    {
        //Just writing to debug output here, could display this to user
        Debug.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
        Debug.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
        Debug.WriteLine($"CANCELED: Did you update the subscription info?");
    }
    stopRecognition.TrySetResult(0);
}

private void Recognizer_Recognized(object sender, SpeechRecognitionEventArgs e)
{
    if (string.IsNullOrWhiteSpace(e.Result.Text))
        _mainVM.Speech = string.Empty;

    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        _mainVM.Transcribed += e.Result.Text;
    }
    else if (e.Result.Reason == ResultReason.NoMatch)
    {
        _mainVM.Status = "Unable to recognize speech";
    }
}

private void Recognizer_Recognizing(object sender, SpeechRecognitionEventArgs e)
{
    _mainVM.Speech = e.Result.Text;
}

Window Closing event handler (clean up resources before exiting)

private void MainWindow_Closing(object sender, System.ComponentModel.CancelEventArgs e)
{
    //unsubscribe from events (just in case its not being done in Dispose)
    recognizer.Recognizing -= Recognizer_Recognizing;
    recognizer.Recognized -= Recognizer_Recognized;
    recognizer.Canceled -= Recognizer_Canceled;
    recognizer.SessionStarted -= Recognizer_SessionStarted;
    recognizer.SessionStopped -= Recognizer_SessionStopped;

    //discpose the recognizer
    recognizer.Dispose();
}

Start/Stop button click event handler

private async void Button_Click(object sender, RoutedEventArgs e)
{
    //recognizer is in a stopped state
    if (_mainVM.StartStop == "Start")
    {
        _mainVM.StartStop = "Stop";

        stopRecognition = new TaskCompletionSource<int>();

        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });
    }
    else if (_mainVM.StartStop == "Stop")
    {
        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);

        _mainVM.StartStop = "Start";
        _mainVM.Speech = String.Empty;
    }
}

Clear text event handler. used in case you want to clear the text by clicking the Clear button.

private void ClearText(object sender, RoutedEventArgs e)
{
    _mainVM.Speech = string.Empty;
    _mainVM.Transcribed = string.Empty;
}

Run the App

Once all the components are added, run the app and click the Start button and start speaking. If everything is working correctly, your microphone should be picking up your speech and transcribing it to text.

Sample Code

https://github.com/erotavlas/SpeechToText_WPF.

References

Azure Cognitive Services - Speech To Text

Speech Service Documentation

azure

azure wpf speech to text

Containerizing an ML Model - Part 1 Previous