The W3C's Voice Extensible Markup Language (VoiceXML) Version 2.0 specification has been released as a Candidate Recommendation, together with an explicit call for implementation. "VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications." Comments on the CR specification are invited through 10-April-2003, when the VoiceXML specification is expected to enter the Proposed Recommendation phase.
See the minor update in the 20-February-2003 version.
Bibliographic information: Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Candidate Recommendation 28-January-2003. Edited by Scott McGlashan, Editor-in-Chief (PipeBeach), Daniel C. Burnett (Nuance Communications), Jerry Carter (Speechworks International), Peter Danielsen (Lucent), Jim Ferrans (Motorola) Andrew Hunt (SpeechWorks International), Bruce Lucas (IBM), Brad Porter (Tellme Networks), Ken Rehor (Invited Expert), and Steph Tryphonas (Tellme Networks). Version URL: http://www.w3.org/TR/2003/CR-voicexml20-20030128/. Latest Version URL: http://www.w3.org/TR/voicexml20/. Previous Version URLs: http://www.w3.org/TR/2002/WD-voicexml20-20020424/, http://www.w3.org/TR/2001/WD-voicexml20-20011023/.
VoiceXML 2.0 Candidate Recommendation Highlights
[from the text of the announcement]
"VoiceXML 2.0 has the power to change the way phone-based information and customer services are developed. No longer will we have to press 'one' for this or 'two' for that. Instead, we will be able to make selections and provide information by speech," explained Dave Raggett, W3C Voice Browser Activity Lead. "In addition, VoiceXML 2.0 creates opportunities for people with visual impairments or those needing Web access while keeping their hands and eyes free for other things, such as getting directions while driving."
In the W3C Speech Interface Framework, VoiceXML controls how the application interacts with the user, while the Speech Synthesis Markup Language (SSML) is used for spoken prompts and the Speech Recognition Grammar Specification (SRGS) for guiding the speech recognizers via grammars that describe the expected user responses. Other specifications in the Framework include Voice Browser Call Control (CCXML), which provides telephony call control support for VoiceXML or other dialog systems, and Semantic Interpretation for Speech Recognition, which defines the syntax and semantics of the contents of tags in SRGS.
There is also an extensive set of test suites publically available with the VoiceXML 2.0 Candidate Recommendation. While the initial version contains over 300 tests, the final version is expected to have more than 500 tests. Updates to the test suite will be announced on the Voice Browser's public mailing list.
This complements the test suite provided with the Speech Recognition Grammar Specification, which became a W3C Candidate Recommendation in June 2002. Test suites for the remaining specifications in the W3C Speech Interface Framework, including the Speech Synthesis Markup Language, which enters its Last Call phase today, are under development by the W3C Voice Browser Working Group and will be published over the next few months.
The W3C Voice Browser Working Group is among the largest and most active in W3C. Its participants include BeVocal Inc., Canon, Comverse, France Telecom, Genesys Telecommunications Laboratories, HP, HeyAnita, Hitachi, IBM, Intel, Loquendo, Microsoft, MITRE, Mitsubishi, Motorola, Nokia, Nortel Networks, Nuance, Philips, PipeBeach, SAP, ScanSoft, SnowShore Networks, SpeechWorks, Sun, Syntellect, Tellme Networks, Unisys, Verascape, VoiceGenie, Voxeo, and Voxpilot. Support for the continued work and commitments to product implementations are strong, as evidenced by the range of testimonials.
As the Working Group moves forward in its technical work across the range of voice-related specifications, patent issues arising from inconsistencies with the Voice Browser Working Group's Royalty-Free Licensing Mode are to be addressed by a Patent Advisory Group within the W3C, per the W3C's Current Patent Practice. With the vast majority of the W3C Voice Browser Working Group committed to the production of an open specification, the Voice Browser Patent Advisory Group will work towards resolving the remaining issues.
Voice Browser Applications
Possible voice browser applications include:
- Accessing business information, including the corporate "front desk" asking callers who or what they want, automated telephone ordering services, support desks, order tracking, airline arrival and departure information, cinema and theater booking services, and home banking services.
- Accessing public information, including community information such as weather, traffic conditions, school closures, directions and events; local, national and international news; national and international stock market information; and business and e-commerce transactions.
- Accessing personal information, including calendars, address and telephone lists, to-do lists, shopping lists, and calorie counters.
- Assisting the user to communicate with other people via sending and receiving voice-mail and email messages.
"This document defines VoiceXML, the Voice Extensible Markup Language. Its background, basic concepts and use are presented in Section 1. The dialog constructs of form, menu and link, and the mechanism (Form Interpretation Algorithm) by which they are interpreted are then introduced in Section 2. User input using DTMF and speech grammars is covered in Section 3, while Section 4 covers system output using speech synthesis and recorded audio. Mechanisms for manipulating dialog control flow, including variables, events, and executable elements, are explained in Section 5. Environment features such as parameters and properties as well as resource handling are specified in Section 6. The appendices provide additional information including the VoiceXML Schema, a detailed specification of the Form Interpretation Algorithm and timing, audio file formats, and statements relating to conformance, internationalization, accessibility and privacy."
"VoiceXML's main goal is to bring the full power of web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm. A voice service is viewed as a sequence of interaction dialogs between a user and an implementation platform. The dialogs are provided by document servers, which may be external to the implementation platform. Document servers maintain overall service logic, perform database and legacy system operations, and produce dialogs. A VoiceXML document specifies each interaction dialog to be conducted by a VoiceXML interpreter. User input affects dialog interpretation and is collected into requests submitted to a document server. The document server replies with another VoiceXML document to continue the user's session with other dialogs."
VoiceXML Architectural Model
"A document server (e.g. a web server) processes requests from a client application, the VoiceXML Interpreter, through the VoiceXML interpreter context. The server produces VoiceXML documents in reply, which are processed by the VoiceXML Interpreter. The VoiceXML interpreter context may monitor user inputs in parallel with the VoiceXML interpreter. For example, one VoiceXML interpreter context may always listen for a special escape phrase that takes the user to a high-level personal assistant, and another may listen for escape phrases that alter user preferences like volume or text-to-speech characteristics."
"The implementation platform is controlled by the VoiceXML interpreter context and by the VoiceXML interpreter. For instance, in an interactive voice response application, the VoiceXML interpreter context may be responsible for detecting an incoming call, acquiring the initial VoiceXML document, and answering the call, while the VoiceXML interpreter conducts the dialog after answer. The implementation platform generates events in response to user actions (e.g. spoken or character input received, disconnect) and system events (e.g. timer expiration). Some of these events are acted upon by the VoiceXML interpreter itself, as specified by the VoiceXML document, while others are acted upon by the VoiceXML interpreter context." [Specification section 1.2.1 ]
- Announcement 2003-01-28: "World Wide Web Consortium Issues VoiceXML 2.0 as a W3C Candidate Recommendation. Cornerstone to the W3C Speech Interface Framework is Ready for Implementors." [source]
- Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Candidate Recommendation 20-February-2003. Updated version provides "a correction to the schemas to fix problems found with some schema tools..." Candidate Recommendation period will close 10-April-2003.
- Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Candidate Recommendation 28-January-2003.
- Testimonials for VoiceXML 2.0. From BeVocal, Inc., Comverse Technology, Genesys Telecommunications Laboratories, Alcatel, HeyAnita Inc., IBM, Loquendo, MTA SZTAKI, NMS Communications, Nuance, PipeBeach, Public Voice, ScanSoft, SnowShore Networks, SpeechWorks International, Unisys Corporation, Tellme Networks, VoiceXML Forum, and Voxpilot Ltd.
- VoiceXML Version 2.0 Implementation Report
- W3C Voice Browser home page
- Archive for public comment mailing list 'www-voice'
- Voice Browser Patent Statements
- Voice Browser Activity Statement
- Introduction and Overview of W3C Speech Interface Framework
- "VoiceXML Forum" - Main reference page.