The W3C Voice Browser Working Group has released an updated working draft for the Speech Synthesis Markup Language Specification. The document has been produced as part of the W3C Voice Browser Activity, which seeks to develop standards enabling access to the web using spoken interaction. The document "describes markup for generating synthetic speech via a speech synthesizer, and forms part of the proposals for the W3C Speech Interface Framework." The Speech Synthesis Markup Language Specification "is part of a set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms." This SSML document has been revised in minor ways to assist in the further development of the W3C Speech Recognition Grammar Format and the W3C VoiceXML 2.0 specification which "are related to the SSML specification, and in some areas depend on this specification."
Bibliographic information: Speech Synthesis Markup Language Specification. W3C Working Draft 5-April-2002. Edited by Daniel C. Burnett (Nuance), Mark R. Walker (Intel), and Andrew Hunt (SpeechWorks International). Version URL: http://www.w3.org/TR/2002/WD-speech-synthesis-20020405/. Latest version URL: http://www.w3.org/TR/speech-synthesis. Previous version URL: http://www.w3.org/TR/2001/WD-speech-synthesis-20010103/.
Language-sensitive processing: "SSML provides a mechanism for precise control of the input and output languages via the use of xml:lang attribute. This facility provides:
- The ability to specify the input and output language overriding the SSML Processor default language
- The ability to produce multi-language output
- The ability to accept input in a language which is different from the language employed in the spoken output."
The normative XML DTD and XML Schema references for SSML are provided in Appendices B and C of the specification, as well as in serparate files.
From the Voice Browser page: "W3C is working to expand access to the Web to allow people to interact via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music. This will allow any telephone to be used to access appropriately designed Web-based services, and will be a boon to people with visual impairments or needing Web access while keeping theirs hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient... [The WG] is defining a suite of markup languages covering dialog, speech synthesis, speech recognition, call control and other aspects of interactive voice response applications. VoiceXML is a dialog markup language designed for telephony applications, where users are restricted to voice and DTMF (touch tone) input. The other specifications are being designed for use in a variety of contexts, and not just with VoiceXML. Further work is anticipated on enabling their use with other W3C markup languages such as XHTML, XForms and SMIL. This will be done in conjunction with other W3C working groups, including the proposed new Multimodal working group..."
Principal references:
- "Last Call Working Draft for W3C Speech Synthesis Markup Language (SSML)." [Update] News item 2002-12-03.
- Speech Synthesis Markup Language Specification. W3C Working Draft 5-April-2002.
- DTD for the Speech Synthesis Markup Language
- W3C XML Schema for the Speech Synthesis Markup Language (see also the no-namespace core schema)
- W3C Voice Browser Activity
- W3C Multimodal Interaction Activity
- Voice Extensible Markup Language (VoiceXML) Version 2.0
- Mailing list archive for 'www-voice'
- "W3C Speech Synthesis Markup Language Specification" - Main reference page.