This article explains speech recognition, speech to text, text to speech and speech synthesis in C#.

If the code isn't working for you, then some speech features aren't installed or not enabled. If you don't have a English version of Windows, or non-English speech recognition, then you can use all code from this article, but then you need to change all words into the language of your speech recognizer.



According to MSDN[^], the SpeechRecognitionEngine class is available in .NET 4.5, 4, 3.5, 3.0 and .NET 4 Client Profile, and the supported Windows versions are:

Windows 8

Windows Server 2012

Windows 7

Windows Vista SP2

Windows Server 2008 (Server Core Role not supported)

Windows Server 2008 R2 (Server Core Role supported with SP1 or later; Itanium not supported).

Windows Vista SP1 or later

Windows Server 2008 (Server Core not supported)

Windows Server 2008 R2 (Server Core supported with SP1 or later)

Windows Server 2003 SP2

Windows XP SP2

Windows Server 2008 R2

Windows Server 2008

Windows Server 2003

Windows 98, Windows Server 2000 SP4

Windows CE

Windows Millennium Edition

Windows Mobile for Pocket PC

Windows Mobile for Smartphone

Windows XP Media Center Edition

Windows XP Professional x64 Edition

Windows XP SP2

Windows XP Starter Edition

The italic platforms are only shown on the MSDN page if you change the .NET Framework version on the page (using the "Other Framework" link on the top of the MSDN page). Please note: the SpeechRecognitionEngine class is not available in .NET for Windows Store apps.

In this article, I tell you how to program speech recognition, speech to text, text to speech and speech synthesis in C# using the System.Speech library.

Speech recognition

To create a program with speech recognition in C#, you need to add the System.Speech library. Then, add this using namespace statement at the top of your code file:

using System.Speech.Recognition; using System.Speech.Synthesis; using System.Threading;

Then, create an instance of the SpeechRecognitionEngine :

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine();

Then, we need to load grammars into the SpeechRecognitionEngine . If you don't do that, the speech recognizer will not recognize phrases. For example, add a grammar with the phrase "test" and we give the grammar the name "testGrammar":

_recognizer.LoadGrammar( new Grammar( new GrammarBuilder( " test" )) { Name = " testGrammar" });

Or:

Grammar gr = new Grammar( new GrammarBuilder( " test" )); gr.Name = " testGrammar" ; _recognizer.LoadGrammar(gr);

If you don't want to give a name to the grammar, do this:

_recognizer.LoadGrammar( new Grammar( new GrammarBuilder( " test" )));

Adding a name is only necessary if you want to unload a grammar in your program. To load grammars asynchronous, use the method LoadGrammarAsync . If you want to load a grammar while the recognizer is running, call the RequestRecognizerUpdate method[^] before loading the grammar, and load the grammar(s) in a RecognizerUpdateReached[^] event handler.

Then, add this event handler:

_recognizer.SpeechRecognized += _recognizer_SpeechRecognized;

If the speech is recognized, the method _recognizer_SpeechRecognized will be invoked. So, we need to create the method. What you can do, is when the program recognized the phrase "test", that you write "The test was successful!". To do that, use this:

void _recognizer_SpeechRecognized( object sender, SpeechRecognizedEventArgs e) { if (e.Result.Text == " test" ) { Console.WriteLine( " The test was successful!" ); } }

As you can see in the comment line, e.Result.Text contains the recognized text. That's useful if you've more then one grammar. But, the speech recognizer wasn't started. To do that, add this code after the _recognizer.SpeechRecognized += _recognizer_SpeechRecognized line:

_recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple);

Now, if we merge all methods, we get this:

static void Main( string [] args) { SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine(); _recognizer.LoadGrammar( new Grammar( new GrammarBuilder( " test" )) Name = { " testGrammar" }); _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; _recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); } void _recognizer_SpeechRecognized( object sender, SpeechRecognizedEventArgs e) { if (e.Result.Text == " test" ) { Console.WriteLine( " The test was successful!" ); } }

If you run that, it will not work. The program will be ended immediately. So, we must ensure that the program does not stop before the speech recognition is completed. We need to create a ManualResetEvent ( System.Threading.ManualResetEvent ), with the name _completed , and if the speech recognition is completed, we will call the Set method, and then the program will end. I loaded also a "exit" grammar. If the user says "exit", we will call the Set method. Because there're two threads, the Main thread and the speech recognition thread, we can pause the Main thread until the speech recognition thread isn't completed. And after the speech recognition is completed, we dispose the speech recognition engine (can take 3 seconds time at worst, at best 50 milliseconds):

static ManualResetEvent _completed = null ; static void Main( string [] args) { _completed = new ManualResetEvent( false ); SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine(); _recognizer.LoadGrammar( new Grammar( new GrammarBuilder( " test" )) Name = { " testGrammar" }); _recognizer.LoadGrammar( new Grammar( new GrammarBuilder( " exit" )) Name = { " exitGrammar" }); _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; _recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); _completed.WaitOne(); _recognizer.Dispose(); } void _recognizer_SpeechRecognized( object sender, SpeechRecognizedEventArgs e) { if (e.Result.Text == " test" ) { Console.WriteLine( " The test was successful!" ); } else if (e.Result.Text == " exit" ) { _completed.Set(); } }

If you're programming a Windows application, you don't need to create a ManualResetEvent , because the UI thread ends only if the user closes the form.

To unload a grammar, use the method UnloadGrammar in the speech recognition engine, and to unload all grammars use the method UnloadAllGrammars . Don't forget to invoke the method RequestRecognizerUpdate and to load the grammars in a RecognizerUpdateReached event handler if the recognizer is running.

Unloading the "test" grammar for example:

foreach (Grammar gr in _recognizer.Grammars) { if (gr.Name == " testGrammar" ) { _recognizer.UnloadGrammar(gr); break ; } }

Create a grammar and load the grammar like this: Grammar testGrammar = new Grammar( new GrammarBuilder( " test" )); _recognizer.LoadGrammar(testGrammar); Then, you can unload the grammar like this: _recognizer.UnloadGrammar(testGrammar);

If you unload a grammar with the second way, then you must ensure that all access modifiers are right. The first way is the easiest way, because if you use the first way, the access modifiers don't matter.

If you add a SpeechRecognitionRejected event handler to the SpeechRecognitionEngine , you can show candidate phrases found by the speech recognition engine. First, add a SpeechRecognitionRejected event handler:

_recognizer.SpeechRecognitonRejected += _recognizer_SpeechRecognitionRejected;

Then, create the _recognizer_SpeechRecognitionRejected function:

static void _recognizer_SpeechRecognitionRejected( object sender, SpeechRecognitionRejectedEventArgs e) { if (e.Result.Alternates.Count == 0 ) { Console.WriteLine( " Speech rejected. No candidate phrases found." ); return ; } Console.WriteLine( " Speech rejected. Did you mean:" ); foreach (RecognizedPhrase r in e.Result.Alternates) { Console.WriteLine( " " + r.Text); } }

This function shows all candidate phrases found by the speech recognition engine if the speech recognition was rejected.

In the same library, there's a namespace System.Speech.Synthesis . In that namespace, you'll find a class SpeechSythesizer , and in the class there's a Speak method. Add the namespace add the top of your code file, and then try this:

SpeechSynthesizer _synthesizer = new SpeechSynthesizer(); _synthesizer.Speak( " Now the computer is speaking to you." );

If you run the code, the computer says: "Now the computer is talking to you." If you know that, you can use the speech recognition code, but instead of the test grammar use this grammar:

_recognizer.LoadGrammar( new Grammar( new GrammarBuilder( " hello computer" )));

And in the _recognizer_SpeechRecognizer method, add this:

void _recognizer_SpeechRecognized( object sender, SpeechRecognizedEventArgs e) { if (e.Result.Text == " hello computer" ) { SpeechSynthesizer synthesizer = new SpeechSynthesizer(); synthesizer.Speak( " hello user" ); synthesizer.Dispose(); } _completed.Set(); }

Use SpeechSynthesizer.Dispose to dispose the SpeechSynthesizer . Now, if you say "hello computer", the computer responds "hello user".

It's also possible to emulate speech recognition with the SpeechRecognitionEngine . You can do that with the EmulateRecognize method, and to do it asynchronous, use the EmulateRecognizeAsync method:

RecognitionResult result = _recognizer.EmulateRecognize( " test" ); _recognizer.EmulateRecognizeAsync( " test" );

But a warning: You can't emulate speech recognition if the speech recognition engine is recognizing speech. So, you need to invoke this method before the method RecognizeAsync is invoked. You can also do it if the engine is ready with speech recognition.

In this article, I used the SpeechRecognitionEngine class. There's also a SpeechRecognizer class. So, what's the difference between the SpeechRecognizer class and the SpeechRecognitionEngine class? If you use the SpeechRecognizer class, you'll see the Windows Speech Recognizer:



If you use the SpeechRecognitionEngine class, you'll not see the Windows Speech Recognizer, the SpeechRecognitionEngine is the engine of a SpeechRecognizer . Also, the SpeechRecognizer class doesn't contain the methods SetInputToDefaultAudioDevice and RecognizeAsync .

If you load more grammars, you can do this (here we load a phrase "dog", "cat" and "snake"):

_recognizer.LoadGrammar( new Grammar( new GrammarBuilder( new Choices( " dog" , " cat" , " snake" ))) { Name = " animalGrammar" });

Advantages:

The code is easier to read.

The UnloadAllGrammars function is faster.

Disadvantages:

If you unload a single grammar, you unload more then one phrase.

You can also combine both ways to load grammars. For example you can load phrases like "dog", "cat", "snake" in a single grammar using Choices , because these are animals. But if you want to unload a single phrase, build only grammars with a single phrase. Instead of passing all phrases as parameters, we can use the Add method:

Choices animalChoices = new Choices(); animalChoices.Add( " dog" ); animalChoices.Add( " cat" ); animalChoices.Add( " snake" );

Or:

Choices animalChoices = new Choices(); animalChoices.Add( " dog" , " cat" , " snake" );

It's possible that you want to load complete phrases like "I like dogs", "I dislike dogs", "I like cats", "I dislike cats", ... It's not a good idea to load all phrases separately. Using the GrammarBuilder.Append method, we can append Choices to the grammar builder:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine(); GrammarBuilder grammarBuilder = new GrammarBuilder(); grammarBuilder.Append("I"); // add "I" grammarBuilder.Append(new Choices("like", "dislike")); // load "like" & "dislike" grammarBuilder.Append(new Choices("dogs", "cats", "birds", "snakes", "fishes", "tigers", "lions", "snails", "elephants")); // add animals _recognizer.LoadGrammar(new Grammar(grammarBuilder)); // load grammar _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; _recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

If the user says "I like dogs", _recognizer_SpeechRecognized will be called. It will be called also if the user says "I like cats", "I like birds", "I dislike snails", ... Now, we can create the _recognizer_SpeechRecognized function. If the user says "I like cats", then "Do you really like cats?" is shown on the console, and if the user says "I dislike cats", then "Do you really dislike cats?" is shown on the console. e.Result.Words[0].Text is the first spoken word:

static void speechRecognitionWithChoices_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { Console.WriteLine("Do you really " + e.Result.Words[1].Text + " " + e.Result.Words[2].Text + "?"); manualResetEvent.Set(); }

If you use a DictationGrammar , your program will recognize all speech using the Windows Desktop Speech technology. You can add a DictationGrammar and a "exit" grammar:

SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit"))); _recognizer.LoadGrammar(new DictationGrammar()); _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; _recognizer.SetInputToDefaultAudioDevice(); // set input to default audio device _recognizer.RecognizeAsync(RecognizeMode.Multiple); // recognize speech

And the _recognizer_SpeechRecognized method:

static void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { if (e.Result.Text == "exit") { manualResetEvent.Set(); return; } Console.WriteLine("You said: " + e.Result.Text); }

new DictationGrammar() returns an instance of the standard dictation grammar provided by Windows Desktop Speech technology.

Using a System.Speech.Synthesis.PromptBuilder , you can build prompt for the SpeechSynthesizer . You can add breaks, styles, sentences ... using the PromptBuilder .

Using the StartSentence and EndSentence method, you can indicate the start and the end of a sentence:

PromptBuilder builder = new PromptBuilder(); builder.StartSentence(); builder.AppendText( " This is a sentence." ); builder.EndSentence(); SpeechSynthesizer synthesizer = new SpeechSynthesizer(); synthesizer.Speak(builder); synthesizer.Dispose();

Using the AppendBreak method, you can append a break:

PromptBuilder builder = new PromptBuilder(); builder.StartSentence(); builder.AppendText( " This is a sentence." ); builder.EndSentence(); builder.AppendBreak( new TimeSpan( 0 , 0 , 1 )); builder.StartSentence(); builder.AppendText( " This is another sentence." ); builder.EndSentence(); SpeechSynthesizer synthesizer = new SpeechSynthesizer(); synthesizer.Speak(builder); synthesizer.Dispose();

Using the StartStyle and EndStyle method, you can indicate the style in the PromptBuilder (for example: loud, fast)

PromptBuilder builder = new PromptBuilder(); builder.StartStyle( new PromptStyle(PromptRate.Fast)); builder.AppendText( " This text is spoken fast." ); builder.EndStyle(); builder.StartStyle( new PromptStyle(PromptVolume.ExtraSoft)); builder.AppendText( " This text is spoken extra soft." ); builder.EndStyle(); SpeechSynthesizer synthesizer = new SpeechSynthesizer(); synthesizer.Speak(builder); synthesizer.Dispose();

Using the StartVoice and EndVoice method, you can indicate the voice, if installed

PromptBuilder builder = new PromptBuilder(); builder.StartVoice(VoiceGender.Male, VoiceAge.Child); builder.AppendText("This is a male child voice, if installed."); builder.EndVoice(); SpeechSynthesizer synthesizer = new SpeechSynthesizer(); synthesizer.Speak(builder); synthesizer.Dispose();

On my computer, there's just one voice installed. So if I try another voice using the StartVoice method, then I don't get another voice.

This question is asked frequently in comments: how to train your speech recognition engine? From code, it is impossible, unfortunately. But you can train it through Windows Speech Recognition:

Open Control Panel Go to Ease of Access Choose Speech Recognition Then choose Train your computer to better understand you

Then you'll see this form:

Press Next and then the training begins. Speak the sentences aloud:

