The Speech Application Language Tags (SALT) 1.0 Specification has been released by the SALT Forum, a "group of companies with a shared goal of accelerating the use of speech technologies in multimodal and telephony systems. The Forum is committed to developing a royalty-free, platform-independent standard that will make possible multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). Version 1.0 of the SALT specification covers three broad areas of capabilities: speech output, speech input and call control. The specification's 'prompt' tag allows SALT-based applications to play audio and synthetic speech directly, while 'listen' and 'bind' tags provide speech recognition capabilities by collecting and processing spoken user input. In addition, the specification's call control object can be used to provide SALT-based applications with the ability to place, answer, transfer and disconnect calls, along with advanced capabilities such as conferencing. The SALT specification draws on emerging W3C standards such as Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS) and semantic interpretation for speech recognition to provide additional application control. Following previously announced plans, the SALT specification is being submitted to an established international standards body to provide the basis of an open, royalty-free standard for speech-enabling multimodal and telephony applications."
Bibliographic information: Speech Application Language Tags (SALT) 1.0 Specification. Reference: SALT.1.0.doc. 15 July 2002. 112 pages. Copyright Cisco Systems Inc., Comverse Inc., Intel Corporation, Microsoft Corporation, Philips Electronics N.V., and SpeechWorks International.
From the Introduction:
Speech Application Language Tags (SALT) 1.0 is an extension of HTML and other markup languages (cHTML, XHTML, WML, etc.) which adds a speech and telephony interface to web applications and services, for both voice only (e.g., telephone) and multimodal browsers. This section introduces SALT and outlines the typical application scenarios in which it will be used, the principles which underlie its design, and resources related to the specification... SALT is a small set of XML elements, with associated attributes and DOM object properties, events and methods, which may be used in conjunction with a source markup document to apply a speech interface to the source page. The SALT formalism and semantics are independent of the nature of the source document, so SALT can be used equally effectively within HTML and all its flavors, or with WML, or with any other SGML-derived markup. The main top-level elements of SALT are: <prompt> for speech synthesis configuration and prompt playing <listen> for speech recognizer configuration, recognition execution and post-processing, and recording <dtmf> for configuration and control of DTMF collection <smex> for general purpose communication with platform components. The input elements listen and dtmf also contain grammars and binding controls: <grammar> for specifying input grammar resources, and <bind> for processing recognition results. The listen element also contains the facility to record audio input: <record> for recording audio input. smex also contains the binding mechanism bind to process messages. All four top-level elements contain the platform configuration element <param>.
There are several advantages to using SALT with a mature display language such as HTML. Most notably (1) the event and scripting models supported by visual browsers can be used by SALT applications to implement dialog flow and other forms of interaction processing without the need for extra markup, and (2) the addition of speech capabilities to the visual page provides a simple and intuitive means of creating multimodal applications. In this way, SALT is a lightweight specification which adds a powerful speech interface to web pages, while maintaining and leveraging all the advantages of the web application model.
From the website: "The Speech Application Language Tags (SALT) Forum is committed to developing a royalty-free, platform-independent standard that will make possible multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The new standard -- Speech Application Language Tags (SALT) -- will extend existing mark-up languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics. Each of these modes will be able to be used independently or concurrently..."
From the announcement:
Advances in several fundamental technologies are making possible mobile computing platforms of unprecedented power. SALT supplies a critical missing component, facilitating intuitive speech-based interfaces that anyone can master. The result is a wealth of new opportunities to serve millions of consumers with compelling content and applications. "Verizon Wireless is pleased to join the SALT Forum to make speech applications more accessible to wireless customers," said Jim Straight, Vice President for Wireless Data and Internet Services at Verizon Wireless. "We expect that the SALT 1.0 specification will help to accelerate deployment of multimodal applications and provide a more natural user interface for mobile devices."
The SALT specification defines a set of lightweight tags as extensions to commonly used Web-based programming languages, strengthened by incorporating existing standards from the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). This allows developers to add speech interfaces to Web content and applications using familiar tools and techniques. In multimodal applications, the tags can be added to support speech input and output either as standalone events or jointly with other interface options such as speaking while pointing to the screen with a stylus. In telephony applications, the tags provide a programming interface to manage the speech recognition and text-to-speech resources needed to conduct interactive dialogs with the caller through a speech-only interface. The SALT specification is designed to work equally well on traditional computers, handheld devices such as PDAs, home electronics such as video recorders, telematics devices such as in-car navigation systems, and communications devices such as mobile phones.
Principal references:
- Announcement 2002-07-15: "Salt Forum Publishes Speech Application Language Tags Specification Version 1.0. Industry Backing for SALT Grows With Completion of Specification, Availability of Products and Support of Top U.S. Wireless Service Provider."
- Speech Application Language Tags (SALT) 1.0 Specification. See also the canonical .DOC format, ZIP
- SALT Forum website
- Contact: SALT Forum
- "Speech Vendors Target Developers with Multimodal Tools." By Ephraim Schwartz. In InfoWorld (July 15, 2002).
- "Speech Application Language Tags (SALT)." - Resources from Microsoft.
- "Speech Application Language Tags (SALT)" - Main reference page.
- See also: "VoiceXML Forum" - Main reference page.