In a few weeks, thousands of people will be gathering at a New York City convention hall to wave banners, wear bright T-shirts, and discuss the strengths and weaknesses of two competitors who are trying to win the hearts and minds of America. The convention’s theme will be “Talk About Possibilities” but there won’t be a single elephant, donkey, or any other inexplicable political symbol in sight.
SpeechTEK 2004 is all about furthering the message of speech technologies to a world that is longing to embrace it with open arms. Ok, so it’s really not the whole world, just a bunch of application developers. But this convention could be a big deal, not only for a computer industry that has finally perfected the science of computerized speech recognition, but also for pedestrian consumers who are more and more unwilling to step through the latest phone menu by pressing “one for English, two for Spanish.” In fact, placing a phone call to just about any corporation these days is often an adventure in frustration. “Press one” has morphed into “Press or say one,” and almost no one elects to say “one” primarily because they’d feel like an idiot talking out loud to absolutely no one. Even if you’re alone, you still opt to press the buttons because it’s quicker, less problematic, and ever so slightly more dignified.
But as voice recognition makes strides in accuracy, it seems to be gaining in popularity as well. SpeechTEK 2004 will feature more vendors and will draw more attendees than any of its predecessors by a long shot. Companies today are saving money by building voice-enabled applications that not only activate simple menu options, but actually recognize questions and statements spoken in natural language, helping us consumers avoid the whole menu labyrinth. Imagine calling your hard drive manufacturer because your brand new drive suddenly refuses to, well, drive. Instead of running into a series of “Press or say one” menus, you get a single voice prompt that says “Please say your order number and describe the problem.” The app then recognizes your order, approves a replacement, and explains that your new drive “will be shipped within 24 hours. Please return the faulty drive in the shipping carton free of charge.” All in under two minutes. Not bad.
In addition to replacing receptionists and tech support people, speech applications can also type letters for you, notify you when your database server has crashed (and reboot it for you), and give you directions to the closest Olive Garden. Even if you don’t have OnStar. The key to all of this is technological advances in speech recognition. About eight years ago, I wrote a less-than-stellar review for a dictation software package that was supposed to be the best on the market at the time. It actually did work. The only problem was when I said “Gee, I’d like a new Ferrari,” the software package typed it as “Cheese I glide canoe for Ari.”
Mercifully, speech technology has come a long way since then, as evidenced by the growing number of corporations that are developing speech apps. Underneath such apps are two technologies that are competing to become the de facto standard in the voice recognition arena. One is Voice XML, which is mainly for phone applications and which has been embraced by the likes of IBM. The other is SALT, which stands for Speech Application Language Tags, and which lets you write apps for the telephone as well as for “multi-modal” devices like cell phones and PDAs. Microsoft is the biggest proponent of SALT. The main intent of these two fairly new technologies is to “webify” voice applications. And while everyone pretends that the pie is big enough for both Voice XML and SALT to both eat and be filled, each of them would like to be a little more filled than the other. Win the minds of web app developers and you win their hearts. Win their hearts and you win their pocketbooks. Win their pocketbooks and you’ll be buying a summer house on the lake.
Hoping to attend SpeechTEK at Microsoft’s expense, I put in a call to my manager, Bob. He was out but I got his message. “Hi, you’ve reached Bob’s phone. If you want to page me, press or say one. If you want to leave me a message, press or say two. If this is Circeo asking for a free trip, hang up and get back to work or you’ll suddenly have a lot of free time on your hands.”
I had no idea Bob was so up-to-date on speech technology.

Ken Circeo lives, writes, and scribbles cartoons in Mill Creek, Washington. He has looked askance at the computer industry for more than twenty years.