Cover Pages: Speech Synthesis Markup Language (SSML) Version 1.0 Advances to W3C Recommendation.

The World Wide Web Consortium has published Speech Synthesis Markup Language (SSML) Version 1.0 as a W3C Recommendation. SSML 1.0 elevates the role of high-quality synthesized speech in Web interactions and represents a fundamental specification in the W3C Speech Interface Framework.

SSML Version 1.0 has been produced by members of the W3C Voice Browser Working Group as part of the the Voice Browser Activity within W3C's Interaction Domain. W3C's Voice Browser WG seeks to "develop standards to enable access to the Web using spoken interaction. The Speech Synthesis Markup Language Specification is one of these standards and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms."

The W3C announcement describes SSML 1.0 as a specification "built for integration with other Web technologies and to promote interoperability across different synthesis-capable platforms. Companion W3C Recommendations like VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) published by the W3C Voice Browser Working Group help define "a suite of markup languages covering dialog, speech synthesis, speech recognition, call control and other aspects of interactive voice response applications. Application designers for mobile phones, personal digital assistants (PDAs), and a host of emerging technologies use SSML 1.0 to achieve both coarse- and fine-grain control of important aspects of speech synthesis."

Specifications produced by the W3C Voice Browser Working Group "bring the advantages of Web-based development and content delivery to interactive voice response applications. Speech Synthesis Markup Language, Speech Recognition Grammar Specification, and Call Control XML are core technologies for describing speech synthesis, recognition grammars, and call control constructs respectively. VoiceXML is a dialog markup language that leverages the other specifications for creating dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key (touch tone) input, recording of spoken input, telephony, and mixed initiative conversations."

In the Voice Browser Working Group, W3C is working "to expand access to the Web to allow people to interact via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music. This will allow any telephone to be used to access appropriately designed Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping their hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient."

Bibliographic Information

Speech Synthesis Markup Language (SSML) Version 1.0. W3C Recommendation. 07-September-2004. Edited by Daniel C. Burnett (Nuance Communications), Mark R. Walker (Intel), and Andrew Hunt (ScanSoft). Version URL: http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/. Latest version URL: http://www.w3.org/TR/speech-synthesis/. Previous version URL: http://www.w3.org/TR/2004/PR-speech-synthesis-20040715/.

SSML 1.0 Implementation Report. Version: 15-July-2004. Contributors: Laura Ricotti (Loquendo, Chief Editor), Paolo Baggia (Loquendo, co-editor), An Buyle (Scansoft), Dave Burke (VoxPilot), Daniel Burnett (Nuance), Jerry Carter (Independent Consultant), Sasha Caskey (IBM), William Gardella (SAP), Frederic Gavignet (France Telecom), Edouard Hinard (France Telecom), Jeff Kusnitz (IBM), Paul Lamere (Sun), Rob Marchand (VoiceGenie), Sheyla Militello (Loquendo), and Luc Van Tichelen (Scansoft).

From the W3C Announcement

"I am excited about the progress the Voice Browser Working Group has made in providing improved access to services over the telephone through the use of Web technologies," said W3C Director Tim Berners-Lee, who will be delivering a keynote address at the SpeechTEK Conference next week. He added, "Companies can now offer Web access to their customers via the telephone as well as from a personal computer."

Aimed at the world's estimated two billion fixed line and mobile phones, W3C's Speech Interface Framework — a collection of specifications for building voice applications for the Web — will allow an unprecedented number of people to use any telephone to interact with appropriately designed Web-based services via key pads, spoken commands, listening to pre-recorded speech, synthetic speech and music.

A Rich Vocabulary for High-Quality Speech: One of the primary challenges to strengthening the voice of the Web that SSML addresses is pronunciation. For example, how do you pronounce "1/2"? The SSML 1.0 specification uses this simple example to illustrate some of the challenges of turning general purpose text into meaningful synthesized speech. Without additional context, one would not know whether to say "one half" or "January second" or "February first" or "one divided by two". SSML 1.0 constructs help eliminate this sort of ambiguity. The SSML vocabulary allows word-level, phoneme-level, and even waveform-level control of the output to satisfy a wide spectrum of application scenarios and authoring requirements.

"SSML builds on the work of the pioneers in speech synthesis to provide application developers with a powerful and flexible means to deliver a high quality mix of synthetic and pre-recorded speech as part of interactive voice response services," said Dave Raggett, Activity Lead for W3C's work on voice browsers, and a W3C Fellow from Canon. He added, "SSML allows VoiceXML-based services to be accessed via textphones for people with speaking or hearing impairments. In addition, SSML has great promise beyond its use with VoiceXML, as we look forward to emerging standards for multimodal interaction."

Like XHTML, SSML is a markup language based on the widely deployed XML standard. SSML content can stand alone or be included in other XML content in order to improve rendering as synthesized speech. Naturally, SSML is particularly well-suited for use with a VoiceXML wrapper when building an interactive voice response application.

SSML 1.0 is built for Web integration in other ways as well. The Voice Browser Working Group worked closely with other W3C groups to ensure that the design of SSML 1.0 is consistent with principles of accessibility, internationalization, and general Web architecture. Indeed, one important application of SSML involves "text phones" that may be used by people with some hearing disabilities. The same content can also be output as speech through a common telephone. SSML 1.0 is also consistent with previous work at W3C on describing pronunciation with Cascading Style Sheets (CSS). W3C's CSS Working Group is developing a speech module in CSS3 for rendering XML documents with SSML-based speech engines.

W3C's Voice Browser Working Group has been particularly successful at ensuring adoption of its specifications before they reach Recommendation status. A test suite (discussed in the July 2004 SSML implementation report) has helped ensure consistent behavior and quality among the already numerous implementations of SSML 1.0. Vendors that have already implemented SSML 1.0 and that are participating in Working Group include: Aspect Communications, France Telecom, Hewlett-Packard, IBM, Loquendo, Microsoft, MITRE, Nuance Communications, SAP, ScanSoft, Sun Microsystems, VoiceGenie Technologies, Voxeo, and Voxpilot.

The Working Group will now focus its energies on the remainder of the Speech Framework. "After VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS), SSML is the third language of the W3C Speech Interface Framework to become a full W3C Recommendation," said Jim Larson, manager, advanced human input/output, for Intel and also co-chair of W3C's Voice Browser Working Group. "We are working to complete work on other languages of the W3C Speech Interface Framework, including VoiceXML 2.1, Semantic Interpretation, and the Call Control eXtensible Markup Language (CCXML)."

The [Voice Browser] Working Group is among the largest and most active in W3C. Its participants include: Aspect Communications, BeVocal, Brooktrout Technology, Canon, Comverse Technology, Convedia, Electronic Data Systems, France Telecom, Genesys Telecommunications Laboratories, HeyAnita, Hitachi, Hewlett-Packard, IBM, Intel, IWA-HWG, Korea Association of Information and Telecommunication, Loquendo, Microsoft, MITRE, Mitsubishi Electric, Motorola, Nokia, Nuance Communications, Openstream, SAP, ScanSoft, Siemens, Sun Microsystems, Syntellect, Tellme Networks, Verascape, Vocalocity, VoiceGenie Technologies, Voxeo, and Voxpilot...

Principal references:

W3C announcement 2004-09-08: "World Wide Web Consortium Issues SSML 1.0 as a W3C Recommendation. High-Quality Synthesized Speech Bolsters Speech Interface Framework." Available also in French and Japanese.
Testimonials for W3C's Speech Synthesis Markup Language (SSML) 1.0 Recommendation. Provided by EDS (Electronic Data Systems), Intel, IWA/HWG (International Webmasters Association / HTML Writers Guild), RNIB (Royal National Institute of the Blind), Loquendo, ScanSoft, Sun Microsystems, UIUC (University of Illinois at Urbana-Champaign), Vocalocity, and Voxpilot.
W3C news item
Speech Synthesis Markup Language (SSML) Version 1.0. W3C Recommendation 7-September-2004.
SSML 1.0 Implementation Report
SSML 1.0 Recommendation Errata. Maintained by Dave Raggett.
SSML 1.0: Candidate Recommendation Disposition of Comments
W3C 'Voice Browser' Activity — Voice Enabling the Web!
W3C Speech Interface Framework:
- VoiceXML 2.0
- VoiceXML 2.1
- Speech Recognition Grammar (SRGS)
- Call Control (CCXML)
- Semantic Interpretation
- Speech Synthesis (SSML)
- Dialog Markup ("V3")
- See also W3C Multimodal Interaction Activity
Related: Synchronized Multimedia Integration Language (SMIL 2.0)
Related: CSS Aural style sheets
W3C Technical Report Maturity Levels for Recommendation-Track Specifications
Mail Archives for W3C public list 'www-voice@w3.org ' "This a list for discussion of the use and design of voice applications on the Web, and more specifically, for feedback on the W3C VoiceXML specifications."
W3C Contact: Max Froumentin (Voice Browser Working Group)
Earlier W3C Speech Interface Framework news:
"W3C Speech Synthesis Markup Language Specification" - Main reference page.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY