Cover Pages: W3C Speech Synthesis Markup Language Specification

Last modified: September 08, 2004

Technology Reports

W3C Speech Synthesis Markup Language Specification

The W3C Speech Synthesis specification "defines a markup language for prompting users via a combination of prerecorded speech, synthetic speech and music. You can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation."

[September 08, 2004] Speech Synthesis Markup Language (SSML) Version 1.0 Advances to W3C Recommendation. The World Wide Web Consortium has published Speech Synthesis Markup Language (SSML) Version 1.0 as a W3C Recommendation. SSML 1.0 elevates the role of high-quality synthesized speech in Web interactions and represents a fundamental specification in the W3C Speech Interface Framework. SSML Version 1.0 has been produced by members of the W3C Voice Browser Working Group as part of the the Voice Browser Activity within W3C's Interaction Domain. W3C's Voice Browser WG seeks to "develop standards to enable access to the Web using spoken interaction. The Speech Synthesis Markup Language Specification is one of these standards and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms." The W3C announcement describes SSML 1.0 as a specification "built for integration with other Web technologies and to promote interoperability across different synthesis-capable platforms. Companion W3C Recommendations like VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) published by the W3C Voice Browser Working Group help define "a suite of markup languages covering dialog, speech synthesis, speech recognition, call control and other aspects of interactive voice response applications. Application designers for mobile phones, personal digital assistants (PDAs), and a host of emerging technologies use SSML 1.0 to achieve both coarse- and fine-grain control of important aspects of speech synthesis." Specifications produced by the W3C Voice Browser Working Group "bring the advantages of Web-based development and content delivery to interactive voice response applications. Speech Synthesis Markup Language, Speech Recognition Grammar Specification, and Call Control XML are core technologies for describing speech synthesis, recognition grammars, and call control constructs respectively. VoiceXML is a dialog markup language that leverages the other specifications for creating dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key (touch tone) input, recording of spoken input, telephony, and mixed initiative conversations." In the Voice Browser Working Group, W3C is working "to expand access to the Web to allow people to interact via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music. This will allow any telephone to be used to access appropriately designed Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping their hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient."

[December 03, 2002] [Last Call Working Draft for W3C Speech Synthesis Markup Language (SSML). The W3C Voice Browser Working Group has released a Last Call Working Draft of the "Speech Synthesis Markup Language Version 1.0." This specification describes markup for generating synthetic speech via a speech synthesizer, and forms part of the proposals for the W3C Speech Interface Framework. The Voice Browser Working Group has sought to develop standards to enable access to the Web using spoken interaction. The Speech Synthesis Markup Language Specification is part of this set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the SSML markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms. SSML is based upon the JSGF and/or JSML specifications, which are owned by Sun Microsystems, Inc.; a related initiative to estabilish a standard system for marking up text input is SABLE." An informative Appendix B provides the XML DTD for SSML; the normative Appendix C defines the SSML XML Schema.

On April 5, 2002 the W3C Voice Browser Working Group released an updated working draft for the Speech Synthesis Markup Language Specification. The document has been produced as part of the W3C Voice Browser Activity, which seeks to develop standards enabling access to the web using spoken interaction. The document "describes markup for generating synthetic speech via a speech synthesizer, and forms part of the proposals for the W3C Speech Interface Framework." The Speech Synthesis Markup Language Specification "is part of a set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms." This SSML document has been revised in minor ways to assist in the further development of the W3C Speech Recognition Grammar Format and the W3C VoiceXML 2.0 specification which "are related to the SSML specification, and in some areas depend on this specification."

[April 18, 2002] From the Voice Browser page: "W3C is working to expand access to the Web to allow people to interact via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music. This will allow any telephone to be used to access appropriately designed Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping theirs hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient... [The WG] is defining a suite of markup languages covering dialog, speech synthesis, speech recognition, call control and other aspects of interactive voice response applications. VoiceXML is a dialog markup language designed for telephony applications, where users are restricted to voice and DTMF (touch tone) input. The other specifications are being designed for use in a variety of contexts, and not just with VoiceXML. Further work is anticipated on enabling their use with other W3C markup languages such as XHTML, XForms and SMIL. This will be done in conjunction with other W3C working groups, including the proposed new Multimodal working group..."

[January 03, 2000] The W3C Voice Browser Working Group issued two 'last call' working draft documents for the W3C Speech Interface Framework. These specifications are part of the W3C Voice Browser Activity, in which W3C "is working to expand access to the Web to allow people to interact with Web sites via spoken commands, and listening to prerecorded speech, music and synthetic speech. This will allow any telephone to be used to access Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping theirs hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient. The review period for both WDs ends 31-January-2001. Review comments on the WDs may be sent to the publicly archived mailing list 'www-voice@w3.org'. The Speech Synthesis Markup Language Specification for the Speech Interface Framework describes markup for generating synthetic speech via a speech synthesizer. Reference: W3C Working Draft 3-January-2001, edited by Mark R. Walker (Intel) and Andrew Hunt (SpeechWorks International). "The Speech Synthesis Markup Language Specification is part of this set of new markup specifications [under development by the Speech Interface Framework working group] for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms." This specification "is based upon the JSML specification, which is owned by Sun Microsystems, Inc."

[August 09, 2000] The W3C Voice Browser working group has released a working draft Speech Synthesis Markup Language Specification for the Speech Interface Framework. Reference: W3C Working Draft 08-August-2000, edited by Mark R. Walker (Intel) and Andrew Hunt (SpeechWorks International). The draft specification "describes markup for generating synthetic speech via a speech synthesiser, and forms part of the proposals for the W3C Speech Interface Framework. This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group." Document abstract: "The W3C Voice Browser working group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of specifications for voice browsers, and provides details of an XML markup language for controlling speech synthesisers. The draft document describes a XML markup language for generating synthetic speech via a speech synthesiser. Such synthesisers embody rich knowledge about how to render text, and the role of the markup language is to give authors a standard way to control aspects such as volume, pitch, rate and other properties. [...] This markup language is intended for use by systems that need to produce computer-generated speech output such as Voice Browsers, web browsers and accessible applications. The language provides a set of elements that are focussed on the specific challenges of automatically producing natural-sounding, understandable speech output." Section 5 of the document supplies the 'DTD for the Speech Synthesis Markup Language'. The design and standardization process for the specification "has followed from the Speech Synthesis Markup Requirements for Voice Markup Languages published December 23, 1999 by the W3C Voice Browser Working Group." The W3C Standard is known as the Speech Recognition Grammar Specification and is based upon the JSML specification, which is owned by Sun Microsystems, Inc., California, U.S.A. Comments on the specification may be sent public mailing list, which is archived..

Principal References

Speech Synthesis Markup Language (SSML) Version 1.0. W3C Recommendation 7-September-2004.
SSML 1.0 Implementation Report
SSML 1.0 Recommendation Errata. Maintained by Dave Raggett.
W3C 'Voice Browser' Activity — Voice Enabling the Web!
W3C announcement 2004-09-08: "World Wide Web Consortium Issues SSML 1.0 as a W3C Recommendation. High-Quality Synthesized Speech Bolsters Speech Interface Framework." Available also in French and Japanese.
Testimonials for W3C's Speech Synthesis Markup Language (SSML) 1.0 Recommendation. Provided by EDS (Electronic Data Systems), Intel, IWA/HWG (International Webmasters Association / HTML Writers Guild), RNIB (Royal National Institute of the Blind), Loquendo, ScanSoft, Sun Microsystems, UIUC (University of Illinois at Urbana-Champaign), Vocalocity, and Voxpilot.
W3C Speech Interface Framework:
- VoiceXML 2.0
- VoiceXML 2.1
- Speech Recognition Grammar (SRGS)
- Call Control (CCXML)
- Semantic Interpretation
- Speech Synthesis (SSML)
- Dialog Markup ("V3")
- See also W3C Multimodal Interaction Activity
Related: Synchronized Multimedia Integration Language (SMIL 2.0)
Related: CSS Aural style sheets
W3C Technical Report Maturity Levels for Recommendation-Track Specifications
Mail Archives for W3C public list 'www-voice@w3.org ' "This a list for discussion of the use and design of voice applications on the Web, and more specifically, for feedback on the W3C VoiceXML specifications."
W3C Contact: Max Froumentin (Voice Browser Working Group)
Earlier W3C Speech Interface Framework news:
SSML Working Draft 5-April-2002:
- Speech Synthesis Markup Language Specification. W3C Working Draft 5-April-2002.
- DTD for the Speech Synthesis Markup Language
- W3C XML Schema for the Speech Synthesis Markup Language (see also the no-namespace core schema)

Articles, News, Papers

[August 27, 2002] "Update on SSML [Speech Synthesis Markup Language Specification]." By Daniel C. Burnett. In VoicexML Review (July/August 2002). "The Speech Synthesis Markup Language (SSML), as its name implies, provides a standardized annotation for instructing speech synthesizers on how to convert written language input into spoken language output. This language has been under development within the Voice Browser Working Group (VBWG) of the World Wide Web Consortium (W3C) for a few years. This article provides a brief update on the status and future of SSML. For background on SSML and an introduction to its features... In April of 2002, the Voice Browser Working Group issued another Working Draft (not a Last Call this time) with some minor content changes. The group is now working towards publication of a new Last Call WD. The April 2002 draft has a fairly small number of changes from the January 2001 draft. It was released primarily to provide XML Schema support for use in VoiceXML and to bring the definition of valid SSML documents in line with that in the other Voice Browser Working Group specifications... The W3C has now moved from encouraging the use of XML Schema to the stronger position of explicitly discouraging the use of DTDs. While the creation of a schema when you already have a DTD is fairly straightforward, the fact that SSML is expected to be embedded in other markup languages (of which VoiceXML is the first example) brought additional requirements to the table: (1) the need to be able to incorporate SSML elements into the host language namespace, (2) the need to modify the SSML elements to add host language-specific attributes and functionality. In the SSML specification the DTD is now informational only, while the schema provides the normative definition of the syntax of the language... Any changes for the next [future] draft are likely to fall into two categories: clarifications of ambiguous or confusing features and text, and the addition features requested or encouraged by other groups in the W3C. Two portions of the specification that were vague in the last Working Draft are the use of the xml:lang attribute and the <say-as> element... The <metadata> element in VoiceXML and SRGS provides a mechanism for expressing information about the document. Both recommend the use of the Resource Description Format (RDF) syntax and schema as the content format for this element; RDF 'provides a standard way for using XML to represent metadata in the form of statements about properties and relationships of items on the Web.' This element (with suggested content structure) is part of the W3C's Semantic Web Initiative, an attempt to develop standard ways of representing the meaning of XML-structured data on the World Wide Web. As such, it is likely that such a capability will be encouraged for SSML..."
"Speech Synthesis Markup Requirements for Voice Markup Languages." W3C Working Draft 23-December-1999. Edited by Andrew Hunt. Version URL: http://www.w3.org/TR/1999/WD-voice-tts-reqs-19991223. Previous version URL: http://www.w3.org/Voice/Group/1999/tts-reqs-19991118. Latest version URL: http://www.w3.org/TR/voice-tts-reqs
See also: Voice Extensible Markup Language (VoiceXML) Version 2.0
2001-01-03 Last Call Working Draft. Speech Synthesis Markup Language Specification for the Speech Interface Framework describes markup for generating synthetic speech via a speech synthesizer. Reference: W3C Working Draft 3-January-2001, edited by Mark R. Walker (Intel) and Andrew Hunt (SpeechWorks International). [cache]
DTD for the Speech Synthesis Markup Language 2001-01-03.
[August 09, 2000] Speech Synthesis Markup Language Specification for the Speech Interface Framework. Reference: W3C Working Draft 08-August-2000, edited by Mark R. Walker (Intel) and Andrew Hunt (SpeechWorks International).
DTD for the Speech Synthesis Markup Language [08-August-2000]


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY