The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: March 16, 2004
Speech Application Language Tags (SALT)


The Speech Application Language Tags (SALT) standard is being designed to "extend existing markup languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics. Each of these modes will be able to be used independently or concurrently."

[August 14, 2002]   SALT Forum Contributes Speech Application Language Tags Specification to W3C.    An announcement from the SALT Forum describes the contribution of the Speech Application Language Tags specification to the World Wide Web Consortium (W3C). The SALT Forum has "asked the W3C Multimodal Interaction Working Group and Voice Browser Working Group to review the SALT specification as part of their development of standards for promoting multimodal interaction and voice-enabling the Web." The contribution is said to furthers the SALT Forum's goal of "establishing an open, royalty-free standard for speech-enabling multimodal and telephony applications." On July 15, 2002 the SALT Forum announced the public release of SALT Version 1.0. Version 1.0 of the SALT specification covers three broad areas of capabilities: speech output, speech input and call control. The specification's 'prompt' tag allows SALT-based applications to play audio and synthetic speech directly, while 'listen' and 'bind' tags provide speech recognition capabilities by collecting and processing spoken user input. In addition, the specification's call control object can be used to provide SALT-based applications with the ability to place, answer, transfer and disconnect calls, along with advanced capabilities such as conferencing, The SALT specification thus "defines a set of lightweight tags as extensions to commonly used Web-based markup languages. This allows developers to add speech interfaces to Web content and applications using familiar tools and techniques. The SALT specification is designed to work equally well on a wide variety of computing and communicating devices." [Full context]

[July 17, 2002]   SALT Forum Publishes Speech Application Language Tags (SALT) Version 1.0.    The Speech Application Language Tags (SALT) 1.0 Specification has been released by the SALT Forum, a "group of companies with a shared goal of accelerating the use of speech technologies in multimodal and telephony systems. The Forum is committed to developing a royalty-free, platform-independent standard that will make possible multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). Version 1.0 of the SALT specification covers three broad areas of capabilities: speech output, speech input and call control. The specification's 'prompt' tag allows SALT-based applications to play audio and synthetic speech directly, while 'listen' and 'bind' tags provide speech recognition capabilities by collecting and processing spoken user input. In addition, the specification's call control object can be used to provide SALT-based applications with the ability to place, answer, transfer and disconnect calls, along with advanced capabilities such as conferencing. The SALT specification draws on emerging W3C standards such as Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS) and semantic interpretation for speech recognition to provide additional application control. Following previously announced plans, the SALT specification is being submitted to an established international standards body to provide the basis of an open, royalty-free standard for speech-enabling multimodal and telephony applications." [Full context]

[February 20, 2002]   SALT Forum Publishes Draft Specification for Speech Application Language Tags.    The SALT Forum has published a draft specification for its "royalty-free, platform-independent standard that will make possible multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs)." SALT (Speech Application Language Tags) is defined by "a small set of XML elements, with associated attributes and DOM object properties, events and methods, which may be used in conjunction with a source markup document to apply a speech interface to the source page. The SALT formalism and semantics are independent of the nature of the source document, so SALT can be used equally effectively within HTML and all its flavours, or with WML, or with any other SGML-derived markup. SALT is an extension of HTML and other markup languages (cHTML, XHTML, WML) which adds a spoken dialog interface to web applications, for both voice only browsers (e.g., over the telephone) and multimodal browsers. For multimodal applications, SALT can be added to a visual page to support speech input and/or output. This is a way to speech-enable individual HTML controls for 'push-to-talk' form-filling scenarios, or to add more complex mixed initiative capabilities if necessary. For applications without a visual display, SALT manages the interactional flow of the dialog and the extent of user initiative by using the HTML eventing and scripting model. In this way, the full programmatic control of client-side (or server-side) code is available to application authors for the management of prompt playing and grammar activation." Appendix A of the version 0.9 draft specification supplies the SALT XML DTD. [Full context]

[October 24, 2001] The SALT Forum was announced on October 15, 2001. Cisco, Comverse, Intel, Microsoft, Philips, and SpeechWorks founded the SALT Forum as a joint initiative for the development of 'Speech Application Language Tags' to be embedded in other markup languages. The group has announced its commitment "to develop a royalty-free, platform-independent standard that will make possible multimodal and telephony-enabled access to information, applications and Web services from PCs, telephones, tablet PCs and wireless personal digital assistants (PDAs). SALT is a lightweight set of XML elements that enhance existing markup languages with a speech interface. SALT will thus extend existing markup languages such as HTML, xHTML and XML. Multimodal access will enable users to interact with an application in a variety of ways: They will be able to input data using speech and/or a keyboard, keypad, mouse or stylus, and produce data as synthesized speech, audio, plain text, motion video and/or graphics. Each of these modes could be used independently or concurrently. Because SALT is independent of the underlying platform, developers will be able to add a speech interface to applications, making them accessible from telephones or other GUI-based devices. The forum founders expect to make the specification publicly available in the first quarter of 2002 and to submit it to a standards body by midyear [2002]."

SALT Forum founders include Cisco, Comverse, Intel, Philips, Speechworks, and Microsoft.

From the announcement 2001-10-15: "The SALT specification is designed to make both multimodal and telephony-enabled applications and services faster and easier to create, deploy and use, resulting in the following benefits:

  • End users will be able to use SALT-based applications speech, text or graphical interfaces independently or together.
  • Developers will be able to seamlessly embed speech enhancements in existing HTML, xHTML and XML pages, using familiar languages, technologies and toolkits.
  • Businesses will be able to offer common Web-based applications across multiple presentation media, resulting in reduced complexity and costs. In addition, they will be able to use their existing Web investments and expertise, and eliminate the need to create discrete applications for each type of output.
  • Service providers will be able to deploy a broad range of applications using standards that enable the widest range of services, offering new business opportunities and revenue streams that better serve both consumers and business customers.

From the Specification Overview:

SALT is a lightweight set of XML elements that enhance existing markup languages with a speech interface. SALT can be used equally effectively with all the flavors of HTML, or with any other SGML-derived markup. Most importantly, SALT does not define a new programming model; it reuses the existing Web execution model so that the same application code can be shared across modalities. And since SALT does not alter the behavior of the markup languages with which it is used, SALT is future-proof: it can be used with any future XML standard.

SALT markup: The main elements of SALT are [1] <prompt> for configuring the speech synthesizer and playing out prompts; [2] <reco> for configuring the speech recognizer, executing recognition, and handling recognition events; [3] <grammar> for specifying input grammar resources; [4] <bind> for processing recognition results into the page These elements are activated either declaratively or programmatically under script running on the client device. SALT also provides DTMF and call control services for telephony browsers running voice-only applications.

From the FAQ document:

Multimodal access will enable users to interact with an application in a variety of ways: input with speech, a keyboard, keypad, mouse and/or stylus; and output as synthesized speech, audio, plain text, motion video and/or graphics. Each of these modes could be used independently or concurrently. For example, a user might click on a flight info icon on a device and say "Show me the flights from San Francisco to Boston after 7 p.m. on Saturday" and have the browser display a Web page with the corresponding flights.

SALT will help address three major challenges:

  1. Input on wireless devices. Wireless devices are becoming pervasive but lack of a natural input mechanism hinders adoption as well as application development on these devices. We believe speech is a compelling solution to the input problem, and SALT will standardize how speech input and output will work for Web applications on those devices.
  2. Speech-enabled application development. Speech-enabled application development is still a difficult task, and not yet the domain of most application developer. By starting with HTML/XHTML's object and eventing methods and script, SALT will introduce speech to the Web developer and create a new class of tools to simplify authoring of some of these applications.
  3. Telephone users. There are 1.6 billion telephones in the world, but only a relatively small fraction of Web applications and services are reachable by 'phone. By enabling a tight integration between existing Web browser, server and network infrastructure and speech technology, SALT will allow many more Websites to be reachable through telephones.

Principal References

General: Articles, Papers, News

  • [March 16, 2004] "Microsoft's Entry to Stir Speech Recognition Market. Company Aims to Make Speech Recognition Technologies Mainstream with Speech Server 2004." By Joris Evers. In InfoWorld (March 16, 2004). "Microsoft Corp. is about to stir the speech recognition market with the launch of its Speech Server products next week. The vendor promises speech recognition for the masses, but analysts warn that speech-enabling applications is not easy. Microsoft Chairman and Chief Software Architect Bill Gates is scheduled to formally launch Speech Server 2004 Standard Edition and Enterprise Edition at the SpeechTEK conference in San Francisco next week. The launch marks the Redmond, Washington-based company's entry into the server-based speech recognition market where it will compete with vendors including Nuance Communications Inc., ScanSoft Inc. and IBM Corp. 'Our goal is to make speech recognition technologies mainstream,' said James Mastan, director of marketing for the Microsoft's Speech Server group. Microsoft's way to do that is by making speech recognition available at lower cost and easier to deploy, manage, develop and maintain than competing products, he said. The pitch is simple. Developers can add speech capabilities to existing Web applications based on Microsoft's ASP application framework by adding code based on XML (Extensible Markup Language) and SALT (Speech Application Language Tags) technologies using Visual Studio .Net. Speech Server takes calls and communicates with the Web server through XML and SALT and makes applications offered online available through the phone, Mastan said. Speech Server runs on Windows Server 2003. The Enterprise Edition needs to run on a separate physical server while Standard Edition, designed for small and medium-sized installations, can be placed on the same hardware as the Web server. Microsoft will recommend configurations and resellers will offer fully configured systems..."

  • [July 12, 2003] "Microsoft Pitches Voice Specification. SALT Support Trumps Voice XML as Speech Server Sounds Return of Enterprise Voice." By Ephraim Schwartz. In InfoWorld (July 11, 2003). "Due for manufacturing release before mid-2004, the Microsoft Speech Server product will include a text-to-speech engine from SpeechWorks -- Microsoft's own speech-recognition engine -- and a telephony interface manager. The offering will also include middleware that is being designed in partnership with Santa Clara, Calif.-based Intel and Dallas-based Intervoice to connect the Microsoft product to an enterprise telephony infrastructure...The server's SALT (Speech Application Language Tags) voice browser sets Microsoft apart from the standards crowd. Rather than adhering to VXML (Voice XML) -- the current W3C standard for developing speech-based telephony applications -- Speech Server is compatible only with applications that use the specifications developed by the SALT Forum, of which Microsoft is a founding member... The SALT specification was originally targeted at the multimodal market for browsing the Web on handheld devices. The theory was that users required multiple ways to interface with smaller devices and that voice would be chief among them, but the market for multimodal handhelds has not materialized... Bill Meisel, a principal at TMA Associates, a leading speech technology research company based in Tarzana, Calif., said enterprise voice adoption will increase due to Microsoft's market influence. Yet, because Speech Server will compete directly with established VXML applications, Microsoft's actions will make speech technology adoption a more complex exercise for the enterprise, according to Meisel. Competing speech technology vendor IBM is a case in point. Big Blue supports VXML and the W3C standard, according to Gene Cox, director of mobile solutions at Armonk, N.Y.-based IBM. Cox said significant VXML applications already exist in the enterprise at companies such as AT&T, General Motors' OnStar division, and Sprint PCS. 'VXML conforms to all W3C royalty-free polices. But SALT is like Internet Explorer; it is free as long as you buy Windows,' Cox said. The debate over which technology to use will not be fought out at the customer level, said Forrester's Herrell, but rather by developers. Irvine, Calif.-based NewportWorks, an information service provider for the real estate industry, is one example of an IBM customer that will be hard to shift away from Voice XML. According to CEO Ken Stockman, the company could not exist without Voice XML. NewportWorks aggregates the data from the MLS (Multiple Listing Service), uses IBM's WebSphere Speech Server to convert the listings for voice access, and sells the service to real estate agencies... Stockman said the learning curve on VXML for developers was negligible. Microsoft, on the other hand, argues that Web developers don't want to learn a new language. Instead, they want SALT tag plug-ins for existing Web-based applications. According to Intervoice, the argument may be resolved through tools such as its Invision, which allows a developer to automatically generate VXML and to possibly generate SALT code in the future..."

  • [July 11, 2003]   Microsoft Enhances Support for Speech Application Language Tags (SALT).    Microsoft has announced several new lines of support for open-standards-based speech technology, including a Speech Server, updated Speech Application Software Development Kit (SASDK), Microsoft Speech Server Beta Program, Early Adopter Program, and specialized training courses. Based upon the Speech Application Language Tags (SALT) specification, the speech server supports unified telephony and multimodal applications. Its key components include Speech Engine Services (Speech Recognition Engine, Prompt Engine, Text-to-Speech Engine) and Telephony Application Services (SALT Interpreter, Media and Speech Manager, SALT Interpreter Controller). With these technology offerings, "customers can use speech to access information from standard telephones and cell phones as well as GUI-based devices like PDAs, Tablet PCs and smart phones. For connectivity into the enterprise telephony infrastructure and call-control functionality, Intel Corp. and Intervoice Inc. will provide a Telephony Interface Manager (TIM) that supports Microsoft Speech Server. The TIM will provide fast and easy integration of the speech server with the Intel NetStructure communications boards, enabling deployment of robust speech processing applications."

  • [July 11, 2003] "Microsoft Releases Speech Server Beta. Company Looks to Integrate Call Centers With Web." By Stephen Lawson. In InfoWorld (July 10, 2003). "Microsoft on Wednesday moved toward the integration of call centers and the Web with the release of the first public beta of its Microsoft Speech Server and a new beta version of its Speech Application Software Development Kit (SDK). The software platform is designed to host voice-based services similarly to the way Web servers host a company's Web site, as well as supporting 'multimodal' applications that take advantage of both voice and Web interfaces. It is based on SALT (Speech Application Language Tags), an extension of current scripting languages including HTML (Hypertext Markup Language) and XML. Companies that need call centers can cut costs by automating them on the server, said Xuedong Huang, general manager for Microsoft speech technologies. Among other things, the server can interpret callers' requests and provide recorded or synthesized responses. Developers also can integrate the voice-based services with Web-based applications that can continue to run on a Web server as they do now. For example, a caller could ask for a stock quote verbally and have it displayed on a handheld device, he said. The beta version of the server can deliver voice-only services to a wired phone and multimodal services to any device with a screen that uses either a wired or a IEEE 802.11 wireless LAN connection to the server. Other wireless technologies will be supported later, Huang said. The software includes a speech recognition engine for handling users' speech inputs and a prompt engine to bring up prerecorded prompts from a database to play for users. It also has a text-to-speech engine that can synthesize audible prompts from a text string when a prerecorded prompt is not available. In addition, it has a SALT Interpreter and other components to support services to callers... The SALT Forum has submitted SALT 1.0 as a specification to the World Wide Web Consortium (W3C). The group has more than 70 members, including founding members Microsoft, Cisco Systems, Intel, Philips Electronics, SpeechWorks International and Comverse, Huang said. SALT is a more lightweight extension of current markup languages than is Voice XML, a specification being used by many voice-based services developers today, according to Mark Plakias, an analyst at Zelos Group Inc., in San Francisco. As a result, it allows companies to draw upon a larger pool of developers than does Voice XML, which is more familiar to developers of traditional IVR (integrated voice response) systems, he said..."

  • [June 24, 2003] "SALT Forum Advances Mobile Content Delivery with Enhancements to Scalable Vector Graphics Specification. New Proposal Paves the Way for Speech Applications on Portable Devices." - "The SALT Forum, a group of companies with a shared goal of accelerating the use of speech technologies in multimodal and telephony systems, today announced that it has published a SALT profile for the World Wide Web Consortium's (W3C) Scalable Vector Graphics (SVG) markup language. The new SVG profile supplements the SALT 1.0 specification, which was contributed to the W3C by the SALT Forum and already included profiles for use with the XHTML and SMIL specifications. SVG is an XML-based language for describing advanced graphics that enables developers to deliver a visually rich user interaction experience in their Web applications. By adding SALT to SVG, developers can further enhance the user experience with interactive spoken interfaces coupled directly to the visual interface. The SVG specification has gained considerable support since its release by the W3C, capturing the attention of leading Web developers. Its ability to render high-quality graphics on displays of varying size and resolution, along with a lightweight design that reduces computational requirements, has made it particularly attractive to manufacturers of cell phones, PDAs and other portable devices. SVG with SALT provides the means to build sophisticated mobile applications for these devices with easy-to-use speech interfaces that are accessible without looking at or touching the equipment. SVG with SALT can be used to provide speech 'hot spots' within a graphic or provide spoken commands for scrolling and zooming the display. It can also be used to embed descriptive services for the visually impaired directly within a graphic, streamlining the workflow process. 'SALT seamlessly adds new speech capabilities to the versatility of SVG that enhance its capacity to make Web applications more powerful and easier to develop,' said Antoine Quint, SVG consultant for Fuchsia Design and co-author of the SVG specification. 'Graphics and speech play complementary roles in making information readily accessible no matter what the situation may be.' The SALT specification was designed to add speech input, speech output and call control capabilities to practically any XML-based language. The SALT profile for SVG demonstrates the potential of this flexible and platform-independent approach for responding to technology advances that support improved content delivery. 'The SALT profile for SVG is an example of the exciting industry developments now underway to realize the full potential of SALT,' said Glen Shires, chairperson of the SALT Forum's Technical Working Group. 'The SALT Forum continues to take an active role in refining, enhancing and supporting the SALT 1.0 specification as an open industry initiative'..."

  • [May 16, 2003] "Adding SALT to HTML." By Simon Tang. From (May 14, 2003). ['Add speech to your applications with SALT, Speech Application Language Tags. The author introduces multimodal XML technology, specifically SALT; using Microsoft's .NET Speech SDK, developers should be able to add SALT elements to HTML web pages.'] "Wireless applications are limited by their small device screens and cumbersome input methods. Consequently, many users are frustrated in their attempts to use these devices. Speech can help overcome these problems, as it is the most natural way for humans to communicate. Speech technologies enable us to communicate with applications by using our voice... Both wireless and speech applications have their benefits but also their limitations. Multimodal technologies attempt to leverage their respective strengths while mitigating their weaknesses. Using multimodal technologies, users can interact with applications in a variety of ways. They can provide input through speech, keyboard, keypad, touch-screen or mouse and receive output in the form of audio, video, text, or graphics... The SALT forum is a group of vendors which is creating multimodal specifications. It was formed in 2001 by Cisco, Comverse, Intel, Microsoft, Philips and SpeechWorks. They created the first version of the Speech Application Language Tags (SALT) specification as a standard for developing multimodal applications. In July 2002, the SALT specification was contributed to the W3C's Multimodal Interaction Activity (MMI). W3C MMI has published a number of related drafts, which are available for public review. The main objective of SALT is to create a royalty-free, platform-independent standard for creating multimodal applications. A whitepaper published by SALT Forum further defines six design principles of SALT. (1) Clean integration of speech with web pages; (2) Separation of the speech interface from business logic and data; (3) Power and flexibility of programming model; (4) Reuse existing standards for grammar, speech output, and semantic results; (5) Support a range of devices; (6) Minimal cost of authoring across modes and devices. The first five principles above result in minimizing the cost of developing, deploying and executing SALT applications. A number of vendors, including HeyAnita, Intervoice,, Microsoft, Philips, SandCherry and Kirusa, SpeechWorks, and VoiceWeb Solutions, have announce products, tools, and platforms that support SALT. There is also an open source project, OpenSALT, in the works to develop a SALT 1.0 compliant browser... Before diving into experimenting with HTML and SALT, we need to set up the appropriate development environment. I am going to use Microsoft's .NET Speech SDK 1.0..."

  • [December 31, 2002] "Introduction to SALT. Unleashing the Potential" By Hitesh Seth (ikigo, Inc). In XML Journal Volume 3, Issue 12 (December 2002). ['Speech Application Language Tags (SALT) is a set of XML-based tags that can be added to existing Web-based applications, enhancing the user interface through interactive speech recognition. In addition, SALT can be used to extend Web-based applications to the telephony world, thereby providing an opportunity to unleash the potential of a huge user community, users of normal touch-tone telephones. SALTforum, an organization founded by Microsoft, Cisco, SpeechWorks, Philips, Comverse, and Intel, has spearheaded development of the SALT specification, now in its 1.0 release.'] "... Multimodality means that we can utilize more than one mode of user interface with the application, something like our normal human communications with each other. For instance, consider an application that allows us to get driving directions - while it's typically easier to speak the start and destination addresses (or even better, shortcuts like 'my home,' 'my office,' 'my doctor's office,' based on my previously established profile), the turn-by-turn and overall directions are typically best viewed through a map and turn-by-turn directions as well, something similar to what we're used to seeing on MapQuest. In essence, a multimodal application, when executed on a desktop device, would be an application very similar to MapQuest but would allow the user to talk/listen to the system for parts of the application's input/output as well - for example, the starting and destination addresses... SALT has been built on the technology required to allow applications built using SALT to be deployed in a telephony and/or multimodal context... It's clear that SALT and VoiceXML have utilized the Web application delivery model as an open platform for delivering telephony applications. However, VoiceXML and SALT have different technical goals - whereas VoiceXML tends to focus on development of telephony-based applications (applications used through a phone), SALT focuses on adding speech interaction and telephony integration to existing Web-based applications and enable true multimodality. In this case, I would also like to highlight the development of another upcoming standardization initiative called X+V (which stands for XHTML+Voice), an effort to combine VoiceXML with XHTML to develop multimodal applications. Another difference between SALT and VoiceXML is the overall approach that has been utilized to develop applications. Whereas VoiceXML is pretty much declarative in nature, utilizing its extensive set of tags, SALT is very procedural and script oriented, having a very small set of core tags. Also, it's important to understand that SALT actually utilizes key components of the standardization effort carried at the W3C Voice Browser Activity, including the XML-based Grammar Specification and the XML-based Speech Synthesis Markup Language. Both these specifications have been used by the VoiceXML 2.0 specification as well..." [alt URL]

  • [October 31, 2002] "Microsoft .NET Speech SDK 1.0 Beta 2: New Features and Enhancements." Microsoft Corp. Product Fact Sheet. October 2002. 4 pages. "The Microsoft .NET Speech SDK is a set of ASP.NET controls, a Microsoft Internet Explorer Speech Add-in, sample applications and documentation that allows Web developers to create, debug and deploy speech-enabled ASP.NET applications. The tools are integrated seamlessly into Microsoft Visual Studio .NET, allowing developers to leverage the familiar development environment. Microsoft Corp.'s approach to speech-enabled Web applications is built around a newly emerging standard: Speech Application Language Tags (SALT). SALT has been submitted to the World Wide Web Consortium (W3C) for adoption as a standard for telephony and multimodal applications, which incorporate speech-enabled elements within a visual Web interface. The beta 2 release of the Microsoft .NET Speech SDK includes a complete tool set for creating and testing SALT-based voice-only telephony applications. It also supports the development of multimodal speech applications on clients such as desktop PCs or Tablet PCs using Internet Explorer browser software... The Microsoft .NET Speech SDK supports SALT 1.0. SALT extends HTML and other markup languages with tags and scriptable objects to perform voice output, spoken-language input, telephony management and messaging. With the new speech add-in for Internet Explorer, developers can type SALT code directly into the browser. [The SDK ] includes a set of robust grammar libraries, written in the W3C-approved Speech Recognition Grammar Specification (SRGS) format, is included in the SDK, which helps developers obtain abstract, complex concepts from the user. For example, gathering recognizable date or time input is quite complex. If the user says 'Thanksgiving' or 'Monday the 5th,' applications can use grammar libraries to make an intelligent determination of the exact date and year... In beta 1 of the SDK, Microsoft implemented its own grammar format. With this release, the Grammar Editor now opens and saves files only in the W3C-approved SRGS format. All elements, properties and validations have been updated for SRGS..." [source .DOC]

  • [October 30, 2002] "Microsoft Unveils .Net Speech Platform." By Ephraim Schwartz. In InfoWorld (October 30, 2002). "Microsoft unveiled Wednesday the first technical preview of its .Net Speech Platform and also announced availability of the second beta release of its .Net Speech Software Developer Kit (SDK) at the Speech TEK conference in New York. As unveiled, the speech platform contains the Microsoft speech recognition engine, the middleware to connect into a telephony system, the SALT (Speech Application Language Tags) interpreter, a SALT voice browser, and the SpeechWorks text-to-speech engine. The .Net Speech Platform will give developers and customers a foundation to design a single application that can run a speech-enabled application in a variety of venues, including telephony, desktops, and in a multimodal format on mobile devices. The platform is expected to enter beta testing next summer and ship by the end of 2003. The integration also allows an application that incorporates speech to run on any Web server running Microsoft ASP.Net (Active Server Pages), according to [Microsoft's] Mastan. The kit's toolset includes the grammar development tool, a prompt creation tool for creating and managing prompts, a debugging tool, the ASP.Net SALT Web controls, and the add-ins for Internet Explorer for multimodal clients. The SDK will also include a tool for creating and testing SALT-based telephony-only applications, and will use the W3C (World Wide Web Consortium) formats for speech grammar... Recent Microsoft partnerships with Intervoice, a leading IVR (Interactive Voice Response) designer, and Intel, a supplier of Dialogic speech boards for telephony, also indicate that Microsoft's speech technology is moving toward mainstream acceptance..." See: (1) the announcement: "Microsoft Releases New Features, Enhancements to .NET Speech SDK; Announces Technical Preview of .NET Speech Platform And Joint Development Program. Ongoing Product Development Efforts Ready Market for Mainstream Adoption Of SALT-Based Applications and Solutions."; (2) "Intel and Microsoft Collaborate on SALT"; (3) "SALT Forum Contributes Speech Application Language Tags Specification to W3C."

  • [October 15, 2002] "Intel and Microsoft Collaborate on Development of Speech Application Language Tags (SALT)-based Speech-Enabled Web Solutions. Initiative to Bring Microsoft and Intel Speech Software and Hardware Technologies to Enterprise and Channel Customers." - "Today at the Intel Communications Summit 2002, Intel Corp. and Microsoft Corp. announced that the two companies are developing enabling technologies and a reference design for the deployment of speech-enabled Web solutions in the enterprise based on the Speech Applications Language Tags (SALT) specification. The scope of the technical collaboration and reference design will include the Microsoft .NET Speech platform along with standards-based Intel communications building blocks which include Intel Architecture servers, Intel NetStructure communications boards and Intel telephony call management interface software... The use of Intel telephony building blocks and the Microsoft .NET Speech platform will help customers more quickly and easily speech-enable Web solutions, and will offer an open, flexible and extensible speech system to a broad customer base. The parties will develop and test their respective technology components and the reference design so that they can be used by developers to build speech-enabled Web applications for the enterprise market segment. Together, Microsoft and Intel will also engage in joint sales and marketing activities focused toward their respective enterprise and channel customers, distributors, technology partners, ISVs and the Web developer community... The SALT specification defines a set of lightweight tags as extensions to commonly used Web-based programming languages and is strengthened by incorporating existing standards from the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). Developed by the SALT Forum, the SALT 1.0 specification was submitted to the W3C under royalty-free terms in August. SALT is designed for both telephony and multimodal Web applications, and will enable access to Web-based information, applications and services from a wide range of devices including PCs, telephones, cell phones, wireless personal digital assistants (PDAs), Pocket PCs and Tablet PCs. Established in October 2001 by Cisco Systems Inc., Comverse Inc., Intel, Microsoft, Philips Speech Processing and SpeechWorks International Inc., the SALT Forum brings together a diverse group of more than 50 industry-leading companies with a common interest in developing and promoting speech technologies for multimodal and telephony applications..." See "Speech Application Language Tags (SALT)."

  • [October 15, 2002] "VoiceXML, CCXML, and SALT." By Ian Moraes. In XML Journal Volume 3, Issue 9 (September 2002), pages 30-34. "There's been an industry shift from using proprietary approaches for developing speech-enabled applications to using strategies and architectures based on industry standards. The latter offer developers of speech software a number of advantages, such as application portability and the ability to leverage existing Web infrastructure, promote speech vendor interoperability, increase developer productivity (knowledge of speech vendor's low-level API and resource management is not required), and easily accommodate, for example, multimodal applications. Multimodal applications can overcome some of the limitations of a single mode application (GUI or voice), thereby enhancing a user's experience by allowing the user to interact using multiple modes (speech, pen, keyboard, etc.) in a session, depending on the user's context. VoiceXML, Call Control eXtensible Markup Language (CCXML), and Speech Application Language Tags (SALT) are emerging XML specifications from standards bodies and industry consortia that are directed at supporting telephony and speech-enabled applications. The purpose of this article is to present an overview of VoiceXML, CCXML, and SALT and their architectural roles in developing telephony as well as speech-enabled and multimodal applications... Note that SALT and VoiceXML can be used to develop dialog-based speech applications, but the two specifications have significant differences in how they deliver speech interfaces. Whereas VoiceXML has a built-in control flow algorithm, SALT doesn't. Further, SALT defines a smaller set of elements compared to VoiceXML. While developing and maintaining speech applications in two languages may be feasible, it's preferable for the industry to work toward a single language for developing speech-enabled interfaces as well as multimodal applications. This short discussion provides a brief introduction to VoiceXML, CCXML, and SALT for supporting speech-enabled interactive applications, call control, and multimodal applications and their important role in developing flexible and extensible standards-compliant architectures. This presentation of their main capabilities and limitations should help you determine the types of applications for which they could be used. The various languages expose speech application technology to a broader range of developers and foster more rapid development because they allow for the creation of applications without the need for expertise in a specific speech/telephony platform or media server. The three XML specifications offer application developers document portability in the sense that a VoiceXML, CCXML, or SALT document can be run on a different platform as long as the platform supports a compliant browser. These XML specifications are posing an exciting challenge for developers to create useful, usable, and portable speech-enabled applications that leverage the ubiquitous Web infrastructure..." [alt URL]

  • [October 15, 2002] "Deploying SALT Telephony Call Control on an e-Business Site." By Glen Shires and Jim Trethewey (Intel). In Intel Developer Update Magazine (July 2002). 8 pages. "Speak to a Web page? Telephony call control for e-Commerce? The technology has arrived. Developers can now add these features to both new and existing e-applications. SALT (speech application language tags) lets users talk (replacing or supplementing the keyboard, mouse, or stylus) to access information online, order products, and so on. The telephony call-control features also let users make or receive calls through their computer or participate in phone conferences directly from a Web page in their browser. Call-control features are available in a JavaScript 'include' library file named 'saltcc.js'. 'saltcc.js' is royalty-free, modifiable source code available under license from Intel. To embed telephony features on your Web site, simply place this file on your Web server, and add the mark-up tags to your Web pages so that the user's PC can make the call.. This article describes only one of the many ways you can use SALT call-control capabilities to integrate innovative communications features within a Web site. A draft version of the SALT specification is available now, and the fully approved version of the specification should be available in early July 2002. Because SALT tools and software development kits are also already available, you can start experimenting now with both call control and speech I/O. By starting now, you can get a jump on business and technology plans that include the use of telephony features for online customers..." See also the article in HTML format, and the earlier article in the series: "Telephony Call Control Now Available for HTML, XHTML, and XML," by Thom Sawicki and Kamal Shah. [cache]

  • [October 09, 2002] "Progress in the VoiceXML Intellectual Property Licensing Debacle." By Jonathan Eisenzopf (The Ferrum Group). From October 2002. "In January of 2002 the World Wide Web Consortium released a rule that requires Web standards to be issued royalty free (RF). Some VoiceXML contributors hold intellectual property related to the VoiceXML standard. Some of those companies have already issued royalty free licenses, while others have agreed to reasonable and non-discriminatory (RAND) licensing terms... The fact that not all contributors have switched to a royalty free licensing model has been a thorn in the progress if the VoiceXML standard. I've voiced my concerns previously on this issue, specifically in SALT submission to W3C could impact the future of VoiceXML... Recently, IBM and Nokia changed their licensing terms from RAND to RF. At the VoiceXML Planet Conference & Expo on September 27 [2002], Ray Ozborne, Vice President of the IBM Pervasive Computing Division assured the audience at the end of his keynote speech that IBM would be releasing all intellectual property that related to the VoiceXML and XHTML+Voice specifications royalty free and encouraged the other participants to do the same... If VoiceXML is going to survive as a Web standard, then all contributors must license their IP royalty free, otherwise, the large investment that's been made will go down the drain. My hope is that the voice browser group at the W3C will either resolve these licensing issues in the next six months or jettison VoiceXML and replace it with SALT. Either way, I believe that it would be prudent for voice gateway vendors to be working on a SALT browser so that customers have the option down the road..."

  • [October 07, 2002] "Microsoft and Intervoice Create Sales, Marketing and Technology Alliance Aimed at Making Speech Technology Mainstream Strategic Alliance to Support Technology Development and Promotion Of the SALT-Based Microsoft .NET Speech Platform, Bringing Open and Economical Web-Based Speech Solutions to Leading Call Center and Enterprise Customers." - "Microsoft Corp. and Intervoice Inc., the world leader in converged voice and data solutions, today announced a strategic alliance that will make open, standards-based speech solutions more accessible to Web developers and enterprise customers. Seeking to make speech technology mainstream, the two companies will engage in sales, marketing and technology development for the upcoming Microsoft .NET Speech platform, a Speech Application Language Tags (SALT)-based solution that will make it faster, easier and more economical for customers to develop and deploy large-scale enterprise telephony and multimodal speech-enabled applications, such as interactive voice response (IVR), customer relationship management (CRM) or enterprise resource planning (ERP) applications... The Microsoft .NET Speech platform is a SALT-based multimodal and telephony-enabled solution for developing Web-based speech applications that will span multiple clients such as PCs, telephones, wireless personal digital assistants (PDAs) and next-generation laptop computers such as the Tablet PC. The platform is a central component of the company's comprehensive SALT-based product offerings that will expand and facilitate speech-enabled Web applications for the broad market. Incorporating both telephony and speech recognition technologies, the Microsoft platform represents a superior infrastructure for building, deploying and operating speech-enabled Web applications and services. Among its many features, the Microsoft platform will offer world-class speech recognition functionality and support both telephony and multimodal devices. For telephony applications, the platform will contain SALT-based voice browser software that will support speech-only access through traditional telephones and cellular phones. To enable multimodal access and effectively provide both speech and visual input and output, a SALT-based speech interpreter may be added to clients such as GUI cellular phones, PDAs and smart clients such as the Tablet PC and Pocket PC. The platform also will offer third-party vendors the ability to add existing speech recognition and text-to-speech engines and help provide scalable, reliable and secure core speech processing services..." [alt URL]

  • [June 24, 2002] "VoiceXML and the Future of SALT. [Voice Data Convergence.]" By Jonathan Eisenzopf. In Business Communications Review (May 2002), pages 54-59. ['Jonathan Eisenzopf is a member of the Ferrum Group, LLC, which provides consulting and training services to companies that are in the process of evaluating, selecting or implementing VoiceXML speech solutions.'] "The past year has been eventful for VoiceXML, the standard that application developers and service providers have been promoting for the delivery of Web-based content over voice networks. Many recent developments have been positive, as continued improvements in speech-recognition technology make voice-based interfaces more and more appealing. Established vendors are now validating VoiceXML, adding it to their products and creating new products around the technology. For many enterprises, this means that the next time there's a system upgrade, VoiceXML may be an option. For example, InterVoice-Brite customers soon will be able to add VoiceXML access to their IVR platform, which would provide callers with access to Web applications and enterprise databases... The introduction of SALT as an alternative to VoiceXML for multi-modal applications will present alternatives for customers who are not focusing exclusively on the telephone interface. However, VoiceXML is likely to be the dominant standard for next-generation IVR systems, at least until Microsoft and the SALT Forum members begin to offer product visions and complete solution sets..."

  • [June 21, 2002] "Telephony Call Control Now Available for HTML, XHTML, and XML." By Thom Sawicki and Kamal Shah (Intel). In Intel Developer Update Magazine (June 2002). 5 pages. "New SALT spices up the Web -- literally. SALT (speech application language tags) lets developers add speech and telephony call-control features to existing and new Web-based applications. These exciting, new features are the next step in making the computing interface more human-friendly or natural. With SALT, users can use speech (instead of the keyboard and mouse or stylus) to access information online, make calls through their computer, order products, and so on. Speech makes using the Web more convenient and accessible through a wide-variety of devices including desktops, laptops, PDAs, and phones. SALT does not require developers to rewrite or rebuild an existing application in a new markup language. Instead, SALT is an extension of existing standards, offering developers a set of new, powerful HTML tags. These tags let developers seamlessly embed speech enhancements in existing HTML, XHTML, and XML pages..." [cache]

  • [May 08, 2002] "Microsoft Releases Tools That Will Enable Mainstream Developers To Add Speech to Web Applications. Microsoft .NET Speech SDK Version 1.0 Beta Is First Software Product Based on Speech Application Language Tags Specification." - Microsoft has announced "the beta release of the Microsoft .NET Speech Software Development Kit (SDK), the industry's first Web developer tool and the industry's first product deployment based on the Speech Application Language Tags (SALT) specification. The SDK seamlessly integrates into the Visual Studio .NET development environment and will make it faster and easier for Web developers to incorporate speech functionality into Web applications and create new business opportunities by leveraging their existing Web development knowledge and skills... The .NET Speech SDK is a set of SALT-based speech application development tools and speech controls that integrate with Visual Studio .NET. It is the first speech toolkit to integrate with a Web server programming environment, Microsoft ASP.NET. The SALT specification, currently under development by a diverse group of industry leaders including Microsoft, defines a lightweight set of extensions to familiar Web markup languages, in particular HTML and XHTML, that will enable multimodal and telephony access to information, applications and Web services from PCs, telephones, cellular phones, Tablet PCs and wireless personal digital assistants (PDAs). The specification is slated to be released to an international standards body by midyear 2002... The .NET Speech SDK, which will ship this week [2002-05-05/11], will allow developers to write combined speech and visual Web applications in a single code base that is easy to maintain and modify, and test those applications at their workstation. It includes tools for debugging and creating simple and robust grammars and prompts, as well as sample applications and tutorials. It also includes a set of SALT-based ASP.NET controls that will allow developers to add speech capabilities into their HTML and XHTML Web applications. The .NET Speech SDK will ship with speech extensions for Microsoft Internet Explorer browser software, effectively extending Internet Explorer to support both speech and visual input and output, as well as a desktop version of Microsoft's new speech recognition engine and a test-level version of the Microsoft Text-to-Speech (TTS) engine..."

  • [May 08, 2002] "Microsoft Tests Tools to Make The Web Talk." By Matt Berger and Ephraim Schwartz. In InfoWorld (May 07, 2002). "In another small step toward creating a voice-enabled Web, Microsoft on Tuesday released to developers a test version of its tools for building applications that can be controlled over the Internet using voice commands... the company made available the beta version of its .Net Speech SDK (Software Development Kit), Version 1.0. Used with Microsoft's Visual Studio .Net developer tools, the SDK is designed to add voice to the list of methods for inputting data, which includes the mouse, keyboard, and stylus. The tools are intended to 'help jumpstart the industry' for building speech-enabled Web applications, such as an airline Web site that allows users to make reservations by talking into a microphone on their computer, said James Mastan, group product manager for Microsoft's .Net speech technologies group. While the beta version will allow developers to design speech-enabled applications and Web services for desktop use, the components needed for speech developers to create telephony-based applications using SALT (Speech Applications Language Tags) is still missing a key component. SALT was created by an industry group known as the SALT Forum, whose founding members include Microsoft, Speechworks International, Cisco Systems, and Intel... However, the rival speech development language, VoiceXML, also has yet to publish its set of APIs to interface with telephony boards. Call control tags are just now being developed by the VXML sub-committee of the W3C (World Wide Web Consortium), according to Dennis King, director of architecture for Pervasive Computing at IBM. IBM is a founding member of the W3C... The set of SALT protocols have not yet been submitted to any standards body but will be this summer, according to Mastan. The SALT members have not yet voted on which standards body to submit the protocols to. The .Net Speech SDK can be used to retool an existing Web application developed with Microsoft's popular developer tools, a benefit that Mastan said would spur its use. Features of the SDK include workspaces for programming applications, as well as for creating the spoken questions and answers that a voice-enabled application would need to understand..."

  • [February 21, 2002] "Speech Technology For Applications Inches Forward." By Matt Berger. In InfoWorld (February 21, 2002). "An early version of an emerging technology that will allow users to control software applications using the human voice was released to developers... A group led in part by Microsoft and Speechworks International, known as the SALT Forum, short for Speech Application Language Tags, released the first public specification of its technology. When completed, the technology would allow developers to add speech 'tags' to Web applications written in XML (Extensible Markup Language) and HTML (Hypertext Markup Language), allowing those applications to be controlled through voice commands rather than a mouse or a keyboard. Other founding members of the SALT Forum include Cisco Systems, Comverse, Intel and Philips Electronics. Nearly 20 other companies have announced support for the effort, according to information at the group's Web site..."

  • [February 20, 2002] "SALT Forum Publishes Specs." By Dennis Callaghan. In eWEEK (February 20, 2002). "The Speech Application Language Tags (SALT) Forum released a working draft of its 0.9 specification for adding speech tags to other Web application development languages on Wednesday. Applications created by that combination would combine speech recognition with text and graphics. The draft, published by the founding members of the SALT Forum -- Cisco Systems Inc., Comverse Inc., Intel Corp., Microsoft Corp., Philips Electronics N.V. and Speechworks International -- lays out the XML elements that make up SALT, the typical application scenarios in which it will be used, the principles which underline its design and resources related to the specification. Elements the draft focuses on include speech input, speech output, Dual Tone Multi-Frequency (DMTF -- generic form of Touch-Tone) input, platform messaging (using Simple Messaging Extension -- SMEX), telephony call control and logging. The SALT Forum is an alternative to a plan pushed by IBM, Motorola Inc. and Opera Software ASA to combine VoiceXML with XHTML to integrate speech and Web application development, and form so-called multi-modal applications. That group has already submitted specifications to the World Wide Web Consortium (W3C). The SALT Forum has indicated that it too will submit to an international standards body but hasn't said which one yet..."

  • [February 11, 2002] "SALT May Energize Speech-Technology Market." By Antone Gonsalves. In InformationWeek (February 11, 2002). "Standards emerging in speech technology could boost the market and make software development easier... Today, VoiceXML from the Worldwide Web Consortium is the primary standard for building an interface to access an application by telephone. Called interactive voice response, or IVR, such applications make it possible to get stock quotes or check account balances without talking to a person. .. SALT [is] a speech technology specification under development by Cisco Systems, Comverse Technology, Intel, Microsoft, Royal Philips Electronics, and SpeechWorks Technologies. SALT, or speech application language tags, is scheduled for release this quarter, and is set to be turned over to an independent standards body by midyear. While VoiceXML has focused on IVR applications, SALT concentrates on multimodal communication, or the ability to ask for information over a cell phone, PDA, or other handheld device, and get a text response... Microsoft also is expected to energize the market when it begins rolling out SALT-based technology later this year. The company is expected to release a beta version of a software development kit in April, and a beta version of its new platform for deploying interactive, voice-enabled Web applications in the fourth quarter. Microsoft last week announced a partnership with SpeechWorks, which has licensed its technology to Microsoft. Strachman expects to see the first SALT applications by the first quarter of next year..."

  • [February 02, 2002] "Speech Vendors Shout for Standards." By Ephraim Schwartz. In InfoWorld (February 01, 2002). "The battle for speech technology standards is set to escalate next week when a collection of industry leaders submits to the World Wide Web Consortium (W3C) a proposed framework for delivering combined graphics and speech on handheld devices. The VoiceXML Forum, headed by IBM, Nuance, Oracle, and Lucent will announce a proposal for a multimodal technology standard at the Telephony Voice User Interface Conference, in Scottsdale, Arizona. Meanwhile, Microsoft will counter with its own news, using the same conference to announce the addition of another major speech vendor to its SALT (Speech Application Language Tags) Forum. The as yet unnamed vendor intends to rewrite its components to work with Microsoft's speech platform. The announcement will follow the addition of 18 new members to the SALT Forum, a proposed alternative to VXML's multimodal solution. New members of the SALT Forum include Compaq and Siemens Enterprise Networks. Founding members include Cisco, Comverse, Intel, Microsoft, Philips, and SpeechWorks... Most mainstream speech developers are currently creating Voice XML speech applications built on Java and the J2EE (Java 2 Enterprise Edition) environment, and running on BEA, IBM, Oracle, and Sun application servers. This week General Magic and InterVoice-Brite announced a partnership to develop Interactive Voice Recognition (IVR) enterprise solutions for 'J2EE environments,' using General Magic's VXML technology. Until recently Microsoft offered only a simple set of SAPI (speech APIs). Now through acquisition and internal development it has its own powerful speech engine which it is giving away to developers royalty free, said Peter Mcgregor, an independent software vendor creating speech products. Microsoft redeveloped SAPI in Version 5.1 to run on its new speech engine, while simultaneously proposing SALT as an alternative to VXML. Wrapping it all up in a marketing context, Microsoft's Mastan called the company's collection of speech technologies a 'platform,' a term previously not used... The issue over which specification of SALT, not due to be released until sometime later this year, or VXML, whose Version 2 is now out for review, is better is an argument that can only be determined by developers. Each side claims the other's specifications are deficient... IBM's William S. 'Ozzie' Osborne, general manager of IBM Voice Systems in Somers, N.Y.: 'I hope that we get to one standard. Multiple standards fragment the market place and create a diversion. I would like to see us get to a standard that is industry wide and not proprietary. What we are proposing to the W3C, using VXML for speech and x-HTML for graphics in a single program, is cheaper and easier than SALT without having to have the industry redo everything they have done'...

  • [February 02, 2002] "The SALT Forum Welcomes Additional Technology Leaders as Contributors. New Members Add Extensive Expertise in All Aspects of Multimodal and Telephony Application Development and Deployment." - "Today the SALT Forum, a group of companies with a shared goal of accelerating the use of speech in multimodal and telephony systems, announced the addition of 18 contributors that will assist in formulating the Speech Application Language Tags (SALT) specification. The announcement marks a milestone in the SALT Forum's effort to create an open, royalty-free, platform-independent markup language for speech-enabling applications. SALT will enable multimodal and telephony access to information, applications and Web services from PCs, Tablet PCs, wireless personal digital assistants (PDAs), telephones, and other mobile devices. The 18 new SALT Forum contributing members are Compaq Computer Corp., Edify Corp., Genesys Telecommunications Laboratories Inc. (a wholly owned subsidiary of Alcatel), Glenayre Electronics Inc., HeyAnita Inc., InterVoice-Brite Inc., Kirusa Inc., Korea Telecom, LOBBY7 Inc., Loquendo (a Telecom Italia company), NMS Communications, PipeBeach AB, Siemens Enterprise Networks LLC, Telera Inc., Tenovis GmbH & Co. KG, T-Systems (a division of Deutsche Telekom AG), VoiceGenie Technologies Inc., and Voxeo Corp. As recognized technology leaders, their active participation in specification development will ensure that SALT serves diverse industry needs, thus accelerating the investment in and adoption of applications that permit spoken input and output... In the months since the Forum was established, members have made significant progress toward formulating the SALT specification. SALT is based on a unique structure combining a core language specification with device-specific capability profiles, making it suitable for a wide spectrum of multimodal and telephony platforms throughout the world. To speed its adoption, SALT is harmonious with today's popular speech engines and Web development tools. Today the SALT Forum also is releasing a technical white paper that outlines its guiding design principles. The SALT Forum expects to make its markup language draft specification available for public comment in the first quarter of 2002, and to submit the specification to one or more international standards bodies by midyear [2002]. Over the coming months the SALT Forum will continue to broaden its reach through additional supporters. The forum seeks to develop a royalty-free standard that augments existing Web markup languages to provide spoken access to many forms of content through a wide variety of devices..."

  • [October 24, 2001] "Microsoft Sprinkles SALT on Developers." By Ephraim Schwartz. In InfoWorld October 23, 2001. "In a classic example of how the high-tech industry works, on the same day that Microsoft's chief architect Bill Gates announced support for one speech recognition technology known as SALT (Speech Application Language Tags), the WC3 (World Wide Web Consortium) announced support for another, VXML. Microsoft this week at its Professional Developer's Conference (PDC) released what it called a '.NET Speech SDK technology preview,' a speech specification for its .NET initiative and for more powerful handhelds. The specifications for Web developers uses SALT to allow developers to create speech tags for HTML, xHTML, and XML markup languages. SALT will make it easier for Web developers to incorporate speech and it will be supported in Internet Explore, Pocket IE,, and Visual According to Kai-Fu Lee, vice president of the Natural Interactive Services Division for Microsoft. While Gates announced support for SALT during his keynote address at PDC and Microsoft released preliminary specs, the W3C announced formal acceptance of Voice XML as the standard for adding speech recognition to make Web-based applications accessible over the telephone network... SALT, which is also targeted at Web developers, is meant to create a voice-activated user interface as part of a larger multi-modal UI for handheld devices. On handhelds, voice is expected to be one of many ways to access information. Although VXML and SALT are targeted at two different platforms, a turf war appears inevitable and Microsoft is being accused of rubbing SALT into an industry already wounded by high expectations and poor follow-through... If SALT is only a small set of lightweight tags as its proponents claim, then it cannot be used for speech applications, nor does it have an inherent advantage over VXML for multi-modal devices, Herrel said. In Herrel's capacity as the speech technology analyst at Giga, she issued a statement advising developers incorporating speech into applications to use VXML. Speaking as a member of the SALT Forum Glen Shires, director of media servers (telephony) at Intel, said he believes both languages have different strengths, VXML for telephony and SALT for multi-modal. However, when asked if developers would then need to learn two development environments to have a complete voice-enabled application, he said, 'It is possible to do everything in SALT.'... This opinion is backed by James Mastan, group product planner for Microsoft .NET Speech Technologies, who also said VXML was created for IVR-based services. He admitted it remains problematic whether VXML could be used for handheld devices: 'Technically it is extremely difficult to go from the voice area [VXML] and extend that to the multi-modal space. It is much easier to take existing HTML markup language and add a few small elements,' Mastan said. Nigel Beck, a member of the VXML Forum, said that the WC3 is investigating creating multi-modal extensions for VXML. The initial strategy behind the VXML initiative rested on the simple fact that cell phone growth is increasing by an order of magnitude faster than any other segment of the wireless market. Thus, VXML's goal is to make that lucrative channel available for current Web services. What is unclear is if SALT proponents will eventually want to target the same lucrative market..."

  • Announcement 2001-10-15: "Cisco, Comverse, Intel, Microsoft, Philips and SpeechWorks Found Speech Application Language Tags Forum to Develop New Standard for Multimodal and Telephony-Enabled Applications and Services. Industry Leaders Join to Accelerate Widespread Adoption of Speech and Graphical Interaction With HTML, xHTML and XML-Based Applications and Web Services." [source from: Cisco, Speechworks, Microsoft]

  • [November 26, 2001] "SALT- New Standard for Speech Apps Business Services." By Tobias Ryberg (MobileStart). November 26, 2001. "Speech applications facilitate mobile Internet access considerably and extend the web's domains beyond computerized devices. For some time VoiceXML has been the only real alternative for developers of voice-driven online services. Now the SALT Forum, headed by Cisco, Intel, Microsoft, and Philips, introduces its own standard for enabling speech access to the web... SALT is a direct challenge to fairly well established VoiceXML, originally developed by AT&T, IBM, Lucent, and Motorola. The latter has been around for several years and gained recognition as an official standard by the Internet standardization body W3C. Promoters of SALT regard it a lightweight alternative to VoiceXML, implying it will be easier to integrate with existing web applications. In that case, it has a significant chance of winning the hearts and minds of web developers all around the world..."

  • See related topics:

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: