Speech To Text
Prerequisites
- Azure Subscription
- Visual Studio 2019
- Basic knowledge of WPF
- Microphone capable of audio capture
Description
Speech to text also known as speech recognition is the real-time transcription of audio streams into text. Some use cases include converting the audio from videos into fully searchable text, voice driven medical report generation, and transcribing meetings and conference calls.
In this post I will build a basic WPF desktop application that converts continous microphone audio to text. It will use Speech To Text service which is part of Microsoft Cognitive Services on Azure. This service is powered by the same recognition technology used for Cortana and Office products. It seems to work really well and is capable of working with multiple languages. For a full list of available languages, see supported languages.
Create An Azure Speech Resource
Log into the Azure portal and Create a New Resource. Type ‘Speech’ in the search box and hit ENTER.
Click Create. Fill out the form with the following details.
- Give your speech resource a name
- Select a subscription
- Select a location
- Select a pricing tier (currently this service is in public preview, there are two pricing tiers F0 and S0, choose F0 for the Free plan)
- Select a resource group
Click Create, This will take you to the deployment overview.
It takes a few moments to deploy your new Speech resource. Once deployed, select Go to resource.
For more details, please visit the official documentation - Create a Speech Resource in Azure
Speech Eervice Subscription Keys
In the left navigation pane select Keys to display your Speech service subscription keys. Each subscription has two keys; you can use either key in your application.
Integrate With The App
Open Visual Studio 2019 and Create an new project using the template for WPF App (.NET Core).
NuGet Package
Install the NuGet package for Microsoft.CognitiveServices.Speech
<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.9.0" />
View Model Class
In your project, add a folder called Viewmodel and in it create a new file called MainVM.cs. Replace the existing class with the following. Remember to add using System.ComponentModel;
These are the properties we will bind to in the Mainwindow.
public class MainVM : INotifyPropertyChanged
{
/// <summary>
/// Intermediate recognition result
/// </summary>
private string _speech = "";
public string Speech
{
get { return _speech; }
set
{
_speech = value;
OnPropertyChanged("Speech");
}
}
/// <summary>
/// The transcribed audio
/// </summary>
private string _transcribed = "";
public string Transcribed
{
get { return _transcribed; }
set
{
_transcribed = value;
OnPropertyChanged("Transcribed");
}
}
/// <summary>
/// Indicates if the session is started or stopped
/// </summary>
private string _startstop = "Start";
public string StartStop
{
get { return _startstop; }
set
{
_startstop = value;
OnPropertyChanged("StartStop");
}
}
/// <summary>
/// Status or error messages
/// </summary>
private string _status = "";
public string Status
{
get { return _status; }
set
{
_status = value;
OnPropertyChanged("Status");
}
}
#region INotifyPropertyChanged Members
public event PropertyChangedEventHandler PropertyChanged;
private void OnPropertyChanged(string propertyName)
{
if (PropertyChanged != null)
{
PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
}
}
#endregion
}
Mainwindow
Open the Mainwindow.xaml file and replace the existing <Grid>
with the following.
<Grid>
<Grid.RowDefinitions>
<RowDefinition Height="Auto"></RowDefinition>
<RowDefinition Height="Auto"></RowDefinition>
<RowDefinition Height="Auto"></RowDefinition>
<RowDefinition Height="*"></RowDefinition>
<RowDefinition Height="Auto"></RowDefinition>
<RowDefinition Height="Auto"></RowDefinition>
</Grid.RowDefinitions>
<TextBlock Grid.Row="0" Margin="10,30,0,0" TextWrapping="Wrap">Intermeditate Result</TextBlock>
<Border Grid.Row="1" Margin="10,0,10,10" BorderThickness="1" BorderBrush="LightGray">
<TextBlock Text="{Binding Speech}"></TextBlock>
</Border>
<TextBlock Grid.Row="2" Margin="10,10,0,0" TextWrapping="Wrap">Final Result</TextBlock>
<Border Grid.Row="3" Margin="10,0,10,10" BorderThickness="1" BorderBrush="LightGray">
<TextBlock Background="#FFEDEDF4" FontSize="14" TextWrapping="Wrap" Text="{Binding Transcribed}"></TextBlock>
</Border>
<DockPanel Grid.Row="4" Margin="10,0,10,0">
<TextBlock>Status</TextBlock>
<TextBox Margin="10,0,0,0" Text="{Binding Status}" Background="#FFFCFFE9"></TextBox>
</DockPanel>
<StackPanel Grid.Row="5" Margin="0,10,0,0" Orientation="Horizontal" HorizontalAlignment="Center">
<Button Background="White" Margin="0,0,20,15" Width="100" Click="Button_Click" Content="{Binding StartStop, Mode=TwoWay}"></Button>
<Button Background="White" Margin="0,0,20,15" Width="100" Content="Clear" Click="ClearText"></Button>
</StackPanel>
</Grid>
Open the code behind file, Mainwindow.cs and add the following components
- Properties to hold Subscription key and service region from Azure
string subscriptionKey = "<replace with your speech service subscription key>";
string serviceRegion = "<replace with service region>";
- Speech config and recognizer
SpeechConfig config;
SpeechRecognizer recognizer;
- Reference to a new view model instance
MainVM _mainVM = new MainVM();
- Stop recongnition task
TaskCompletionSource<int> stopRecognition;
- In the constructor we need to set our DataContext to the view model
_mainVM
, create instances of theSpeechRecognizer
andSpeechConfig
, and subscribe to the necessary events.
public MainWindow()
{
InitializeComponent();
DataContext = _mainVM;
// Creates an instance of a speech config with specified subscription key and service region.
// Replace with your own subscription key // and service region (e.g., "westus").
config = SpeechConfig.FromSubscription(subscriptionKey, serviceRegion);
recognizer = new SpeechRecognizer(config);
//subscribe to events
recognizer.Recognizing += Recognizer_Recognizing;
recognizer.Recognized += Recognizer_Recognized;
recognizer.Canceled += Recognizer_Canceled;
recognizer.SessionStarted += Recognizer_SessionStarted;
recognizer.SessionStopped += Recognizer_SessionStopped;
Closing += MainWindow_Closing;
}
- Recognizer event handlers
private void Recognizer_SessionStopped(object sender, SessionEventArgs e)
{
_mainVM.Status = "Speech recognition session stopped";
stopRecognition.TrySetResult(0);
}
private void Recognizer_SessionStarted(object sender, SessionEventArgs e)
{
_mainVM.Status = "Speech recognition session started...";
}
private void Recognizer_Canceled(object sender, SpeechRecognitionCanceledEventArgs e)
{
_mainVM.Status = ($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
//Just writing to debug output here, could display this to user
Debug.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Debug.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Debug.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopRecognition.TrySetResult(0);
}
private void Recognizer_Recognized(object sender, SpeechRecognitionEventArgs e)
{
if (string.IsNullOrWhiteSpace(e.Result.Text))
_mainVM.Speech = string.Empty;
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
_mainVM.Transcribed += e.Result.Text;
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
_mainVM.Status = "Unable to recognize speech";
}
}
private void Recognizer_Recognizing(object sender, SpeechRecognitionEventArgs e)
{
_mainVM.Speech = e.Result.Text;
}
- Window Closing event handler (clean up resources before exiting)
private void MainWindow_Closing(object sender, System.ComponentModel.CancelEventArgs e)
{
//unsubscribe from events (just in case its not being done in Dispose)
recognizer.Recognizing -= Recognizer_Recognizing;
recognizer.Recognized -= Recognizer_Recognized;
recognizer.Canceled -= Recognizer_Canceled;
recognizer.SessionStarted -= Recognizer_SessionStarted;
recognizer.SessionStopped -= Recognizer_SessionStopped;
//discpose the recognizer
recognizer.Dispose();
}
- Start/Stop button click event handler
private async void Button_Click(object sender, RoutedEventArgs e)
{
//recognizer is in a stopped state
if (_mainVM.StartStop == "Start")
{
_mainVM.StartStop = "Stop";
stopRecognition = new TaskCompletionSource<int>();
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
// Waits for completion.
// Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
}
else if (_mainVM.StartStop == "Stop")
{
// Stops recognition.
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
_mainVM.StartStop = "Start";
_mainVM.Speech = String.Empty;
}
}
- Clear text event handler. used in case you want to clear the text by clicking the Clear button.
private void ClearText(object sender, RoutedEventArgs e)
{
_mainVM.Speech = string.Empty;
_mainVM.Transcribed = string.Empty;
}
Run the App
Once all the components are added, run the app and click the Start button and start speaking. If everything is working correctly, your microphone should be picking up your speech and transcribing it to text.
Sample Code
https://github.com/erotavlas/SpeechToText_WPF.
References
Salvatore S. © 2020