IBM has announced the WebSphere Voice Application Access middleware product designed to simplify "building and managing voice portals and to more easily extend web-based portals to voice. Leveraging the scalability, personalization, and authentication features of IBM's WebSphere Portal, it enables mobile workers to more easily access information from multiple voice applications -- using a single telephone number. This new offering includes IBM's WebSphere Voice Server as well as ready-to-use email, personal information management (PIM) functions, and sample portlets. It also supports VoiceXML and Java -- including development tools based on Eclipse, the open-source, vendor-neutral platform for writing software -- and uses open-standard programming languages to create voice-enabled applications that will interoperate with a range of web servers and databases. Building on the VoiceXML standards allows IBM WebSphere Voice Application Access to work with third party browsers and their associated underlying speech recognition and text-to-speech technologies. As the VoiceXML 2.0 specification nears final approval, IBM WebSphere Voice Application Access will move quickly to support it."
Description from "Extending an Enterprise with IBM WebSphere Voice Application Access," by Eddie Epstein:
Voice access to computers has become a preferred interface. The availability of high quality automated speech recognition and speech synthesis technologies, combined with lower cost and higher performance hardware, make automated voice access feasible for most applications. What is particularly important is that most applications can be multichannel in nature: providing voice access in addition to the traditional visual interface.
A major stumbling block for the voice interface has been the unnatural and difficult-to-understand nature of computer generated voices. Recent breakthroughs in the use of concatenative text-to-speech technology has eliminated this limitation and resulted in voice quality comparable to human speech. Speech recognition accuracy has also continued to improve, so that millions of people daily use their voice to 'dial' phone numbers by saying a person's name, manage their investment portfolios, and access weather information, sports scores and other information. In addition to technology improvements, the steady refinement of conversational dialogue design has resulted in a much more efficient and pleasant user experience than was provided by earlier voice activated systems...
The last critical piece to fall into place has been the availability of VoiceXML, an open standards-based voice application design protocol that is supported by all major speech technology suppliers. This standard was designed to allow voice applications to run on all enterprise-quality computer hardware and operating system platforms. Companies can be sure that their investment in a VoiceXML application infrastructure won't lock them into a single supplier for critical system components.
VoiceXML was introduced specifically to eliminate the need for proprietary IVR application design environments, to automatically provide the integration to middleware using the view-and-form based model of Web application design, and to create a standardized interface to speech recognition and speech synthesis technologies. VoiceXML enables WebSphere Voice Application Access to integrate voice interface capabilities in the same way WebSphere Portal Server applications are built on HTML and WML. These protocols provide a modular application design environment with common components sharable across all access modalities.
Most existing automated voice solutions have been created using proprietary voice application environments combined with custom interfaces to back-end business logic and data. These custom interfaces are difficult to integrate with traditional GUI Web access solutions. IBM WebSphere Voice Application Access combines the modular application design paradigm of IBM WebSphere Portal Server with VoiceXML to add voice access to the other modalities supported by WebSphere Portal Server. By building on VoiceXML, not only is the growing community of voice application developers able to directly leverage the voice application access platform, but platform customers should be able to choose between leading speech recognition and text-to-speech offerings.
Components of an application implementation using IBM WebSphere Voice Application Access: As with traditional visual interfaces, application logic and the generation of presentation markup is done in the application server middleware. VoiceXML markup is delivered to a speech server stack including a VoiceXML browser and underlying automatic speech recognition (ASR) and text-to-speech (TTS) technologies. A media gateway such as IBM WebSphere Voice Response is required to provide connectivity with the telephone network... Individual portlets deliver VoiceXML markup to the Voice Aggregator, which creates complete VoiceXML documents including support for a global main menu. Markup is sent to a compliant VoiceXML browser using standard HTTP connectivity. The VoiceXML browser works with ASR and TTS engines to interpret spoken input and generate voice output. The browser can also accept DTMF (telephone keypad) as input and use prerecorded audio files for output.
In order to interpret voice input, ASR engines use active vocabularies that identify recognizable words. These vocabularies also specify allowable word sequences; this combination of vocabulary and specific word ordering is called a speech recognition grammar. Each word in a grammar is represented by a spelling, but it is actually the word's pronunciation that is used by the ASR engine. Although both ASR and TTS speech technologies have large dictionaries of word pronunciations, applications will often use words or abbreviations outside the dictionary that require the definition of new pronunciations.
Roadmap: Building on the VoiceXML standards allows IBM WebSphere Voice Application Access to work with third party browsers and their associated underlying speech recognition and text-to-speech technologies. As the VoiceXML 2.0 specification nears final approval, IBM WebSphere Voice Application Access will move quickly to support it... New wireless networks and devices capable of supporting both voice and data channels require multimodal applications that combine the power of voice and visual interfaces. IBM WebSphere Voice Application Access is designed to combine with other IBM portal offerings to offer a platform for multimodal applications using server based voice technology.
From the 2002-12-02 announcement:
Adding to its portfolio, IBM unveiled the new WebSphere Voice Application Access product: middleware that simplifies building and managing voice portals and more easily extends web-based portals to voice. Leveraging the scalability, personalization and authentication features of IBM's WebSphere Portal, it enables mobile workers to more easily access information from multiple voice applications -- using a single telephone number.
This new offering includes IBM's WebSphere Voice Server as well as ready-to-use email, personal information management (PIM) functions, and sample portlets. It also supports VoiceXML and Java -- including development tools based on Eclipse, the open-source, vendor-neutral platform for writing software -- and uses open-standard programming languages to create voice-enabled applications that will interoperate with a range of web servers and databases.
In keeping with IBM's strategy to provide solutions across multiple platforms, IBM will be working to make WebSphere Voice Application Access interoperable with offerings from third party VoiceXML vendors, such as Nuance and Cisco. In addition, IBM is also working with independent solutions vendors including V-Enable, Voxsurf and Viecore to extend their current solutions.
Principal references:
- Announcement 2002-12-02: "IBM Advances Pervasive Computing Strategy With New Software. Voice Portal Technology and Tools Extend IBM Momentum With Device Manufacturers, Service Providers, and Enterprises."
- Pervasive Computing Software: WebSphere Voice Application Access Version 4.1
- "Extending an Enterprise with IBM WebSphere Voice Application Access." By Eddie Epstein (WebSphere Voice System Architect). From IBM Pervasive Computing. November 2002. 12 pages. [cache]
- Voice Portal. For a beta version of the WebSphere Voice Application Access product, see this alphaWorks distribution.
- IBM WebSphere Portal for Multiplatforms
- IBM Pervasive Computing
- Pervasive Computing Library
- Email contact: pervasive@us.ibm.com
- W3C Voice Browser Activity
- "VoiceXML Forum" - Main reference page.