« Growler Update: 2007-04-01 | Main | Apple and EMI agree to DRM-free music »

Vista Speech Recognition

I saw this ExtremeTech article on the web a couple days ago. It talks about how speech recognition technology may finally be reaching the masses. It almost makes me want to install Vista on one of our computers so that I can give it a test drive.

By Jason Cross

If you have Windows Vista, even the Home Basic version, you already have one of the more powerful speech recognition systems available. Microsoft has invested many millions of dollars in research regarding speech recognition over the years. Some of what they study in the R&D labs is years away from being a product, but there's a lot of new fancy speech recognition technology built right into Vista.

To get started, all you need to do is click the start menu, type the word "speech," and click on "Windows Speech Recognition." If you do it that way, you'll just be stumbling around in the dark. To get the most out of speech recognition, you'll want to go to the control panel and run the wizards and tutorials.

We'll step you through some of the cooler features of speech recognition and give you some tips on how to use it. Before you know it, you'll be talking to your email. All you need is any version of the Vista operating system and a microphone. How good is it? Well, we only touched the keyboard a scant few times while making this entire article. Continued... Your computer can't do what you say if it can't hear you, so the first step is to get your microphone set up. In our experience, it certainly helps to use a higher-quality microphone. A poor mic can cause problems with Vista understanding what you say.


After plugging in your mic (or headset), you'll want to make sure it's working well. Open the Control Panel, click Hardware and Sound, and then under the Sound heading, click Manage Audio Devices. The Recording tab should show your microphone, though this may vary from one sound device to the next. Double-click the microphone to open its properties. Again, the dialog may be somewhat different from one sound card or integrated audio device to the next, but the general principles are the same. A Custom tab might list a +20dB boost option, and a Levels tab should show recording volume. For now, leave these as default, but remember where they are: Tweaking these settings can really come in handy.

Starting Out With Speech
Now that your mic is set up, let's get working with speech recognition. In the Control Panel, type "speech" into the search box or click the Ease of Access category, then choose Speech Recognition Options.

You'll see five options:

Start Speech Recognition
Set up microphone
Take Speech Tutorial
Train your computer to better understand you
Open the Speech Reference Card
You'll eventually want to use most of these, and if you click Start Speech Recognition for the first time on your computer, Vista will automatically walk you through the microphone setup and tutorial.

The microphone setup is really straightforward. It's just a wizard that asks you which type of microphone you have (headset, desktop boom, or "other") and then asks you to read a sentence so that Vista can adjust input levels. If the wizard keeps reporting that it can't hear you clearly, you might want to return to the microphone setup we described above and play around with the mic boost and input level yourself. Continued... Okay so you have the microphone set up, and you're ready to start surfing the web with verbal commands and dictating some email. Not so fast. Sure you can jump right into that, but you really want to run the Speech Tutorial. This is a slick, full-screen, step-by-step guide to the most common commands and conventions used by the built-in speech recognition. It'll show you everything you need to know to use this feature effectively, and it only takes about 10 or 15 minutes. You can do the entire thing by voice—no clicking required. If you run through the tutorial, you really don't need the rest of this article.

Still, we know the hardcore do-it-yourselfers that read ExtremeTech aren't exactly the "guided tutorial" types, so we'll go ahead and describe how speech recognition is used. Besides, most of you probably don't have Vista yet and want to see what you're missing. By its very nature, it is hard to describe speech recognition in a way that makes sense: It's much easier to just show someone. We'll do our best, but to fully appreciate all this, you need to see and hear it for yourself.

With the little speech recognition panel fixed at the top of your screen, you make Windows wake up and listen by saying "Start Listening." After that, your PC will try to interpret everything that comes into the mic as either a command or dictation. If you find you have to take a break, or talk to someone else, just say "Stop Listening." Simple.

Vista can interpret a host of commands to enable basic navigation around the desktop and simple application use. Say "Start" and the start menu pops open. The blinking text cursor sits in the search box, as it always does with the start menu is opened. So whatever you say will be entered as a search term. Say "Vista" and you'll get all the start menu items, emails, songs, photos, and other search hits with "Vista" in it. Then just say the name of the one you want to click on with your voice.

If you want to launch an application, you can skip the start menu entirely by saying "Start [app name]." Vista is smart enough to know that not everyone uses the same nomenclature—you'll see this throughout speech recognition, but there are times when you have to be careful what you say. You can say "Start calculator," "launch calculator," or "open calculator" interchangeably. If you're dictating, there is a distinct difference between "clear" and "delete," though. This works for non-Microsoft programs as well; "launch Firefox" works just as well as "launch Internet Explorer."

Scrolling up and down windows is as easy as saying "scroll up" or "scroll down." You can get more elaborate, like "scroll down three" to move further down, or "move to the end of the document" to go all the way to the bottom of the email, web page, word doc, or whatever you have open.

Vista recognizes common menu commands for the application you're using, and will reference the tooltip text on interface buttons. You can simply say "File … print" to open the File drop-down menu and select Print. Dialog boxes that have options like "Save/Don't Save/Cancel" can easily be dealt with by simply saying the text on the button you want to press. This can take some getting used to. In Firefox, for instance, the button to refresh the page is labeled "Reload current page." If you say that, it'll be just as if you hit that button. If you say "refresh," the speech recognition will get confused.

Want to click on something? Say "click [item]". This works for commands like "double click" and "right click" as well. At the desktop, for instance, you simply say "right click recycle bin" to get a context sensitive menu, and then "empty recycle bin" to perform that action.

If you're having trouble clicking on something, just say "show numbers." This will show a translucent box with a number over every clickable object in the current application—every button, link, and hyperlinked image. Say the number you want to click on, and Vista highlights it green to confirm that this is indeed where you want to click. Say "OK" and it clicks there.

Last but not least, you can directly simulate keystrokes by saying "press [key]." If the speech recognition just can't understand your command to refresh, say "press F5." Some keys can be virtually pressed without saying the word "press," like Enter, Home, End, PageUp, and PageDown. Vista recognizes fancier commands like "press right arrow three times" and "press control and U." Continued... Speech recognition is a fantastic feature for those with disabilities that make it hard to use a keyboard or mouse, but for those who aren't challenged in that way, it can still prove useful as a dictation tool. In virtually any application or dialog box you could type text into, Vista's speech recognition can take dictation from you.

You don't have to do anything special to enable dictation: If the cursor is in a text-entry box, just talk and it types. It works in most web forms, email programs, word processors, you name it. Dictating to your computer is a bit of an art, and an acquired taste. At first, the speech recognition is rarely as accurate as we would all like it to be. It gets better over time, but you can improve accuracy right from the start by running the training option in the speech control panel. You'll read a set of sentences, and Vista will listen to your voice and adapt its algorithms to improve its understanding of what you say - at least, that's the idea. It takes about 5 or 10 minutes and is well worth the effort.

When dictating, you'll want to annunciate and speak clearly, but not really slowly, and certainly not too quickly. The best advice comes from the built-in tutorial - speak like a newscaster. It helps to speak in complete sentences or at least entire phrases. Many words and pairs of words can sound like others, especially to a computer. By speaking in complete phrases, you give your PC a chance to interpret the context of what you say. It really improves accuracy greatly.

Your most useful command will certainly be "delete that." Whatever you say, whether it's a single word or a whole phrase, saying "delete that" will nuke the last thing Vista typed for you. It's a simple do-over command that you'll use frequently.

Punctuation is a tricky spot when dictating to your computer. For instance, if I wanted to capitalize the word "computer" in the previous sentence, I would say "select computer" and then simply "capitalize." Ending sentences is as easy as saying "period" and adding commas is as simple as saying "comma." Select a word or phrase and say "italicize," and you're golden. Vista's speech recognition is smart enough to understand phrases like "select the previous two sentences" or "delete the next paragraph," and this makes editing easy. It's often best to go ahead and ignore mistakes until you're done dictating a paragraph or two, and then go back and fix them.

Other punctuation is a little harder, though. Say "select harder" and it will highlight the word "harder". If you want to put it in quotes, the natural thing to do now is to say "put that in quotes" or "quotes" or "quotation." Doing any of these will simply replace the word "harder" with those words, though. You have to get the cursor in front of the word "harder" by saying "move to harder" and then say "open quote," then move it to the end of harder with "move after harder" and then "close quote." Phrases like "begin quote" and "end quote" are not recognized as such - you'll end up dictating those words.

When Vista can't understand you or is unsure about what you're trying to say, it may pop up a corrections box with a list of guesses. Just say the number that corresponds to the correct word or phrase and say "OK." If you don't see the correct thing listed, just try saying it over; the list will automatically update. As a last resort, you can say "spell it" to go letter-by-letter. Every time you do this, Vista learns more about what you say and when you say it, so accuracy improves. This is one reason why it's more important to use the "correct [word/phrase]" phrase than to "select [word/phrase]" and then replace it. Using "correct [word/phrase]" lets Vista know it made a mistake, using the "select [word/phrase]" feature just means you changed your mind.

Despite the kinks, the dictation in Vista is actually quite good. It takes about an hour to get yourself trained in the right way to navigate around a document and for the computer to learn the way you speak. If you're willing to invest that kind of time, you can dictate emails, forum posts, and blog comments much more quickly than you can type them. Continued... If you're going to use speech recognition to browse the web, it's probably best to use Internet Explorer. Other browsers like Firefox work, but most of the plain-language commands like "go to address," to move to the address bar, are understood in IE, but not in Firefox. What's more, we were unable to dictate in a couple of forum text-entry boxes in Firefox, while that feature worked in IE.

When browsing the web, you'll need a lot of the general window browsing commands used throughout windows. "Scroll down" and "scroll up" are common. To click on a link, just say "click [text of link]." On the ExtremeTech home page, for example, simply saying "Click five free online photo editors" will click on that headline, then once in the article, say "click discuss this now" to go to the discussion thread. The web can be tricky, though. If you say "click opinion" on the ExtremeTech home page, it won't click on that opinion button on the top navigation menu, because that's actually a graphic. You'll need to use the "show numbers" function to click that, and many other parts of most web pages.

Other handy commands are "back," "refresh," "right click [link]," and "google [words]" (if google is your search box provider).

To be frank, browsing the web using nothing but voice commands is quite frustrating at first. We found ourselves repeating the same phrases over and over, and awkwardly fumbling around with "spell it" corrections to web addresses. It gets better as the speech recognition becomes more accurate and as we learned to surf a bit differently, but it's still less than ideal. We'll stick to the mouse for web surfing, for now.

When All Else Fails
Sometimes, no matter what you try saying, you get stuck. You just can't get Vista to recognize your command, or to click on the thing you want it to. When this happens, you're not totally out of luck. The go-to command for help is "what can I say?" This will pop up a help menu about speech recognition, and you can drill down further in it with simple voice commands.

If you can't click on the thing you want, you can click anywhere on the screen with the mousegrid feature. Just say "mousegrid" and a 3x3 grid of numbered rectangles divide the screen. Say a number like "six" to zoom in, creating a new 3x3 grid in just that rectangle. Keep saying numbers until you get down to a small enough rectangle to click on the thing you want, then say "click [number]." With this, you can effectively click, double-click, or right-click on anything on the whole screen. It's a little slow, so it's sort of a last resort. Continued... The speech recognition in Vista isn't perfect - far from it - but it really is impressive just how well it works. It's clear that Microsoft has invested heavily in researching this area, and it's also clear that there is still a long way to go. For all its foibles, the Vista's speech recognition is good enough that anyone could find value in dictating a few emails now and then.

We wrote most of this article using voice, and while plenty of mistakes were made (and slowly corrected), it definitely went a lot smoother near the end. There were plenty of misunderstandings ("are" instead of "our") and wrong words, but the dictation started getting more accurate as both Vista learned my voice and I learned how to work with the peculiarities of speech recognition. You may notice the style of this article is a bit awkward—this is one of the dangers of dictating to your computer. The way that one is expected to write professionally is quite different from the way a person talks. It takes some practice to be mindful of "talking as you would write," because it doesn't sound natural in your head.

For this first attempt at a major article, I probably spent enough time correcting mistakes and fumbling with punctuation, all with voice commands, that it would have been faster and easier to just type it at my usual 75 words per minute. Then again, I have been typing articles for a living for nearly a decade, so I am certainly not the "average Joe" of typing vs. dictation. Personally, I'll stick to typing most of the time, but Vista's speech recognition is good enough that I wouldn't mind answering a few casual emails with it.

On some level, you get out of speech recognition what you are willing to put into it. You have to stick with it, letting Vista learn how you talk and making corrections, before it gets so accurate that it's a joy to use. For many users, that's simply asking too much. Still, how cool is it that this simply comes with every version of Vista - even the Home Basic edition? If you're a Vista user, you should set aside an afternoon one weekend to fiddling around with it. We think you'll be impressed; it's definitely one of those show-off features that impresses the "well what can Vista do that XP can't?" crowd. Hopefully, Microsoft will continue to improve speech recognition in updates and service packs, and continue to give away this helpful tool in all future Windows versions.


Post a comment

About

This page contains a single entry from the blog posted on April 5, 2007 6:55 PM.

The previous post in this blog was Growler Update: 2007-04-01.

The next post in this blog is Apple and EMI agree to DRM-free music.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33