Managed Speech API in Windows Vista / .NET 3.0
By Peter Bromberg
A short overview of some of the new Windows Vista Speech features and a demo of a managed - code program that uses Speech Recognition and a programmatically - constructed grammar.
In Windows Vista, one of the first nice things you may notice is the new Microsoft Voice, Microsoft Anna, who replaces Microsoft Sam (that's the gravelly guy who sounds like he just drank a fifth of bourbon). Anna provides a more pleasant and natural-sounding voice; her voice was created from real voice recordings, unlike previous Microsoft voices.
The .NET 3.0 Framework includes a managed speech API, System.Speech. This allows you to easily create speech-enabled Windows applications for Windows Vista using Visual Studio 2005. SAPI 5.3 is only available on Windows Vista. As with previous versions of SAPI, your application can run on earlier versions, such as Windows XP's SAPI 5.1, however, if your application uses any features specific to SAPI 5.3, you can expect errors.
The two namespaces for .NET speech-enabled applications are System.Speech.Synthesis and System.Speech.Recognition.
System.Speech.Synthesis
The System.Speech.Synthesis namespace is used to access the SAPI synthesizer engine to render text into speech using an installed voice such as Microsoft Anna.
The SAPI 5.3 synthesizer supports the W3C standard Speech Synthesis Markup Language (SSML), a markup language that allows you to finely tune how the synthesizer will produce words, such as pronunciation, speed, volume and pitch of the produced phrase.
System.Speech.Recognition
The System.Speech.Recognition engine is used to recognize a user's voice and convert it into text. The SAPI 5.3 recognition engine supports the W3C standard -- Speech Recognition Grammar Specification (SRGS), a markup language that defines how and what words are recognized, and also added support for Semantic Interpretation. You could have a grammar that defines yes or no as acceptable answers. With Semantic Interpretation, a user could say "no," "nope," "not" and the semantic value for each of these phrases would be "no." Probably, "negatory" wouldn't make it through. The developer only needs to check the semantic result for "no."
In addition, constructing grammars and rules programmatically has been made much, much easier. The idea is that you'll be able to do things like "I'd like a <size> <topping> pizza" -- like so:
GrammarBuilder pizzaBuilder = new GrammarBuilder():
pizzaBuilder.AppendPhrase("I'd like a");
pizzaBuilder.AppendChoices(New Choices("small", "regular", "large"));
pizzaBuilder.AppendChoices(New Choices("pepperoni", "cheese"));
pizzaBuilder.AppendPhrase("pizza");
// load it into the recognizer
_reco.LoadGrammar(New Grammar(pizzaBuilder));
In Windows Vista, everything related to speech and speech recognition has been vastly enhanced, and you have programmatic access to all of it through managed code. Speech as a tool for programmers has matured to the point where, in my opinion, the sky is the limit - all you need is to learn some basics, have a good concept, and go ahead and implement it. I"ve been doing TTS and Speech recognition apps since classic VB with SAPI 3.0 and 4.0, and I can tell you that what you have at your disposal in Windows Vista is unmatched in the industry.
I put together a short Windows Forms app for Vista using both the Synthesis and the Recognition engines that uses the example code from MSDN wich allows the user to set the background color of the form from a Choices grammar. You can say either "Set background to " <color> or "Make background " <color> and it will capture the color argb integer value in the semantics["rgb"] key/value pair parameter, provided it is in the list of 174 Known Colors. The program will then set the form's background color and use Speech.Speak to repeat back your command, in Anna's soothing voice. The command is also displayed on a label in the form.
The code is extremely simple (all 86 lines of it) so I'll post it below:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Text;
using System.Windows.Forms;
using System.Speech.Recognition;
using System.Speech.Synthesis ;
namespace REcoSample
{
public partial class Form1 : Form
{
private System.Speech.Recognition.SpeechRecognitionEngine _recognizer =
new SpeechRecognitionEngine();
private SpeechSynthesizer synth = new SpeechSynthesizer();
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
Grammar grammar= CreateGrammarBuilderRGBSemantics2(null);
_recognizer.SetInputToDefaultAudioDevice();
_recognizer.UnloadAllGrammars();
grammar.Enabled = true;
_recognizer.LoadGrammar(grammar);
_recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(_recognizer_SpeechRecognized);
_recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
SemanticValue semantics = e.Result.Semantics;
string rawText = e.Result.Text;
RecognitionResult result = e.Result;
if (!semantics.ContainsKey("rgb"))
{
this.label1.Text = "No info provided.";
}
else
{
this.label1.Text = rawText;
this.BackColor = Color.FromArgb((int)semantics["rgb"].Value);
synth.Speak(rawText);
}
}
private Grammar CreateGrammarBuilderRGBSemantics2(params int[] info)
{
//Create a set of choices, each a lookup from a color name to rgb
//Choices constructors do not take SematicResultValue, so cast SematicResultValue to GramarBuilder
Choices colorChoice = new Choices();
foreach (string colorName in System.Enum.GetNames(typeof(KnownColor)))
{
SemanticResultValue choiceResultValue =
new SemanticResultValue(colorName, Color.FromName(colorName).ToArgb());
GrammarBuilder resultValueBuilder = new GrammarBuilder(choiceResultValue);
colorChoice.Add(resultValueBuilder);
}
SemanticResultKey choiceResultKey = new SemanticResultKey("rgb", colorChoice);
GrammarBuilder choiceBuilder = new GrammarBuilder(choiceResultKey);
//Create intermediate grammars with introductory phrase and the color choice
GrammarBuilder makeBackgroundBuilder = "Make background";
makeBackgroundBuilder.Append(choiceBuilder);
GrammarBuilder configureBackgroundBuilder = new GrammarBuilder("Set background to");
configureBackgroundBuilder.Append(new SemanticResultKey("rgb", colorChoice));
Choices bothChoices = new Choices(makeBackgroundBuilder, configureBackgroundBuilder,configureBackgroundBuilder );
GrammarBuilder bothBuilder = new GrammarBuilder(bothChoices);
Grammar grammar = new Grammar(bothBuilder);
grammar.Name = "Make Background /Configure background as";
return grammar;
}
}
}You can download the
Visual Studio 2005 Windows Forms solution here. This uses Windows Vista (.NET 3.0 / SAPI 5.3) -specific APIs, so don't expect it to work 100% on Windows XP.
Popularity (7543 Views)
 |
| Biography - Peter Bromberg |
Peter Bromberg is a C# MVP, MCP, and .NET expert who has worked in banking, financial and telephony for over 20 years. Pete focuses exclusively on the .NET Platform, and currently develops SOA and other .NET applications for a Fortune 500 clientele. Peter enjoys producing digital photo collage with Maya,playing jazz flute, the beach, and fine wines. You can view Peter's UnBlog and IttyUrl sites.
|  |
|
|
Article Discussion: Managed Speech API in Windows Vista and .NE 3.0
Missing methods (AppendPhrase, AppendChoices)
Fabio Miguez replied
to Peter Bromberg at Sunday, May 06, 2007 4:29 PM
Hello Peter,
Nice article! I cannot seem to find methods such as GrammarBuilder.AppendPhrase or .AppendChoices, which you used in your pizza ordering example. Am I missing something?
Thanks,
In the article I explained some differences in .NET 3.0
Peter Bromberg replied
to Fabio Miguez at Sunday, May 06, 2007 4:29 PM
and Windows Vista. Do you have .NET 3.0 installed, and are you using Windows Vista, or Windows XP?
My configuration
Fabio Miguez replied
to Peter Bromberg at Sunday, May 06, 2007 4:29 PM
Hi Peter,
I am using .NET 3.5 and Windows Vista. 3.5 was installed as part of Visual C# Express 2008.
What puzzles me also is I did not find a reference to those methods in MSDN.
Thanks for the quick reply.
Recognition supress (e.g. close, start listening) in other applications
Vista Ultimate SP1 VS2008 SP1 C#
The application should detect speech for a limited set of words (3).
Unfortunelty Vista detects word which are not in grammar like "close" or "stop listening".
So after detecting this "system words" Vista want to close the application.
Vista also shows numbers on the screen for input detection.
How can I supress all this Vista system messages ?. I only want that it detects my words, and no other action.
The application should further assist 3 headsets, so that 3 user could speak in different microphones and
it can detect which user has spoken.
How can i realize this ?.
Thank you very much.
recognizer.SetInputToDefaultAudioDevice();
Grammar customGrammar = CreateCustomGrammar();
recognizer.UnloadAllGrammars();
recognizer.LoadGrammar(customGrammar);
recognizer.SpeechRecognized += new EventHandler
(recognizer_SpeechRecognized);
ecognizer.SpeechHypothesized += new EventHandler(recognizer_SpeechHypothesized);
private Grammar CreateCustomGrammar()
{
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append(new Choices("Hello","John","Rider"));
return new Grammar(grammarBuilder);
}
Reply·
Email·
View Thread·
PermaLink·
Book
Can u convert above code in SAPI 5.1
Umaid Saleem replied
to Bernhard Bildstein at Sunday, May 06, 2007 4:29 PM
Hi, I am trying to run your code in SAPI 5.1 as I am having xp installed in my OS, but facing lot of diffulties in running the code so please help.
jordan replied
to Peter Bromberg at Sunday, May 06, 2007 4:29 PM
Peter
I enjoyed your article.
I'm working in the past - do you have a copy of Sapi 3 beta?
Thanks
Jordan