The W3C Speech Synthesis specification "defines a markup language for prompting users via a combination of prerecorded speech, synthetic speech and music. You can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation."
On April 5, 2002 the W3C Voice Browser Working Group released an updated working draft for the Speech Synthesis Markup Language Specification. The document has been produced as part of the W3C Voice Browser Activity, which seeks to develop standards enabling access to the web using spoken interaction. The document "describes markup for generating synthetic speech via a speech synthesizer, and forms part of the proposals for the W3C Speech Interface Framework." The Speech Synthesis Markup Language Specification "is part of a set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms." This SSML document has been revised in minor ways to assist in the further development of the W3C Speech Recognition Grammar Format and the W3C VoiceXML 2.0 specification which "are related to the SSML specification, and in some areas depend on this specification."
[April 18, 2002] From the Voice Browser page: "W3C is working to expand access to the Web to allow people to interact via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music. This will allow any telephone to be used to access appropriately designed Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping theirs hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient... [The WG] is defining a suite of markup languages covering dialog, speech synthesis, speech recognition, call control and other aspects of interactive voice response applications. VoiceXML is a dialog markup language designed for telephony applications, where users are restricted to voice and DTMF (touch tone) input. The other specifications are being designed for use in a variety of contexts, and not just with VoiceXML. Further work is anticipated on enabling their use with other W3C markup languages such as XHTML, XForms and SMIL. This will be done in conjunction with other W3C working groups, including the proposed new Multimodal working group..."
[January 03, 2000] The W3C Voice Browser Working Group issued two 'last call' working draft documents for the W3C Speech Interface Framework. These specifications are part of the W3C Voice Browser Activity, in which W3C "is working to expand access to the Web to allow people to interact with Web sites via spoken commands, and listening to prerecorded speech, music and synthetic speech. This will allow any telephone to be used to access Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping theirs hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient. The review period for both WDs ends 31-January-2001. Review comments on the WDs may be sent to the publicly archived mailing list 'www-voice@w3.org'. The Speech Synthesis Markup Language Specification for the Speech Interface Framework describes markup for generating synthetic speech via a speech synthesizer. Reference: W3C Working Draft 3-January-2001, edited by Mark R. Walker (Intel) and Andrew Hunt (SpeechWorks International). "The Speech Synthesis Markup Language Specification is part of this set of new markup specifications [under development by the Speech Interface Framework working group] for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms." This specification "is based upon the JSML specification, which is owned by Sun Microsystems, Inc."
[August 09, 2000] The W3C Voice Browser working group has released a working draft Speech Synthesis Markup Language Specification for the Speech Interface Framework. Reference: W3C Working Draft 08-August-2000, edited by Mark R. Walker (Intel) and Andrew Hunt (SpeechWorks International). The draft specification "describes markup for generating synthetic speech via a speech synthesiser, and forms part of the proposals for the W3C Speech Interface Framework. This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group." Document abstract: "The W3C Voice Browser working group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of specifications for voice browsers, and provides details of an XML markup language for controlling speech synthesisers. The draft document describes a XML markup language for generating synthetic speech via a speech synthesiser. Such synthesisers embody rich knowledge about how to render text, and the role of the markup language is to give authors a standard way to control aspects such as volume, pitch, rate and other properties. [...] This markup language is intended for use by systems that need to produce computer-generated speech output such as Voice Browsers, web browsers and accessible applications. The language provides a set of elements that are focussed on the specific challenges of automatically producing natural-sounding, understandable speech output." Section 5 of the document supplies the 'DTD for the Speech Synthesis Markup Language'. The design and standardization process for the specification "has followed from the Speech Synthesis Markup Requirements for Voice Markup Languages published December 23, 1999 by the W3C Voice Browser Working Group." The W3C Standard is known as the Speech Recognition Grammar Specification and is based upon the JSML specification, which is owned by Sun Microsystems, Inc., California, U.S.A. Comments on the specification may be sent public mailing list, which is archived..
For related research and development, see (1) "Java Speech Markup Language (JSML/JSpeech)," (2) "SSML: A Speech Synthesis Markup Language," (3) "SABLE: A Standard for Text-to-Speech Synthesis Markup," and (4) "SpeechML."