IT'S NICE when your comments are taken seriously. The NY Times is reporting that IBM will open up the source code of their speech recognition engine, and donate such code, worth $10 million greenbacks in development cost, to two open source groups, the Apache Software Foundation and the Eclipse project, each receiving different chunks of code. At least one analyst from Opus Research was enthusiastic, and is quoted in the NYT article saying "that should drastically reduce the cost of building speech applications".
A bit of History
Speech recognition technology is not something new for IBM. The company introduced its first software-only speech
recognition product, Voicetype, back in 1996 by
bundling
it with the company's then-flagship 32-bit operating system, IBM OS/2 Warp version 4.0. It was even
integrated
with the operating system and the OS/2 port of Netscape Navigator, thus making it the first 32-bit desktop OS that
shipped with voice recognition. Ahh, the opportunities missed...
Lots of water went under the bridge. First Big Blue dumped Voicetype, which was a discrete-speech engine (meaning that you had to pause briefly between words to get your speech recognised), and created Viavoice, the first "continuous speech" engine. By the time, the product ran on Windows, and faced strong competition from other windows players like Lernout & Hauspie.
Five years ago, IBM released the "Viavoice Toolkit for Linux" but only in binary form. That particular effort didn't fare well and the project quickly was forgotten, ignored, and ultimately abandoned.
Linux to get a competitive edge?
IBM has currently an agreement with US-based firm
Scansoft, to market Big Blue's Viavoice product for windows end-users
on the retail market. The latest Viavoice version for windows is retaling for
$160 when bundled with an USB microphone,
and
$69 without it. A version for Mac OS-X is
also available. In the end, this move will give Linux a competitive edge, when and if the speech recognition engine is
bundled and integrated into popular Linux distros.
However, from what can be learned from this report, the source code contributed is just the speech recognition engine, that is, the text-mode "back-end", without any graphical user interface. And is not quite clear if the released source will be capable of taking continuous speech, or just provide basic "navigation" (simple words and phrases). In the end, it will be the task of open source groups like Gnome and KDE to build the hooks between this voice recognition engine and the most popular Linux graphical desktops, allowing for direct dictation into applications, for instance. µ