Cover Pages: Voice Extensible Markup Language (VoiceXML)


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

SEARCH
Advanced Search

ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS

LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic

Last modified: June 24, 2005

Technology Reports

Voice Extensible Markup Language (VoiceXML)

Overview: This reference document provides information about the Voice Extensible Markup Language (VoiceXML), developed by W3C's Voice Browser Working Group as part of the W3C "Voice Browser" Activity and promoted by the VoiceXML Forum. The 'VoiceXML Forum' was announced originally under the name 'VXML Forum'.

[March 16, 2004] VoiceXML 2.0 and Speech Recognition Grammar Published as W3C Recommendations. The World Wide Web Consortium has released the first two W3C Recommendations in its Speech Interface Framework. "Aimed at the world's estimated two billion fixed line and mobile phones, W3C's Speech Interface Framework will allow an unprecedented number of people to use any telephone to interact with appropriately designed Web-based services via key pads, spoken commands, listening to pre-recorded speech, synthetic speech and music." The Voice Extensible Markup Language (VoiceXML) Version 2.0 Recommendation defines VoiceXML, designed for "creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications." The second Recommendation, Speech Recognition Grammar Specification Version 1.0, is key to VoiceXML's support for speech recognition, and is used by developers to describe end-users responses to spoken prompts. It defines syntax for representing grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an Augmented BNF Form and an XML Form. The specification makes the two representations mappable to allow automatic transformations between the two forms."

[February 07, 2001] "The VoiceXML Forum is an industry organization established to promote VoiceXML as the universal standard for speech-enabled Web applications. The Forum, which is composed of over 350 member companies (4 Sponsor Members, 29 Promoter Members, and 320 Supporter Members), supports the work of the VoiceXML community through its conformance testing, marketing, education, and outreach efforts. Bolstered by a membership that has more than tripled in the past year, in 2000 the Forum launched a Technical Council to support its Conformance and Education Committees, and also formed a Marketing Committee. The VoiceXML Forum is a program of the IEEE Industry Standards and Technology Organization (IEEE-ISTO), which manages the day-to-day operations of the Forum."

On March 02, 1999, the formation of a new 'Voice eXtensible Markup Language Forum (VXML Forum)' was announced by AT&T, Lucent Technologies, and Motorola. The VXML Forum "aims to drive the market for voice- and phone-enabled Internet access by promoting a standard specification for VXML, a computer language used to create Web content and services that can be accessed by phone. AT&T, Lucent and Motorola will contribute their markup language technologies to the development of the open VXML specification. Seventeen other leading companies from the speech, Internet and communications markets have agreed to support the VXML Forum and play an active role in reviewing or contributing to the VXML specification. The initial specification will be available for public comment and contribution next month [April 1999], with the goal of submitting a final proposed specification for standardization to the World Wide Web Consortium (W3C) later this year. The initial VXML language specification will be based upon characteristics and functionality that includes Phone Markup Language or PML, an extension of the HTML language from AT&T, Lucent and Motorola's VoxML."

"The VXML Forum has four main objectives: (1) to develop an open VXML specification and then submit it for standardization; (2) to educate the industry about the need for a standard voice markup language; (3) o attract industry support and participation in the VXML Forum; (4) o promote industry-wide use of the resulting standard to create innovative content and service applications."

According to information provided on the VXML Forum's Web site, VXML has its roots in a research project called PhoneWeb at AT&T Bell Laboratories. After the AT&T/Lucent split, both companies pursued development of independent versions of a phone markup language. Lucent's Bell Labs continued work on the project, now known as TelePortal. The recent research focus has been on service creation and natural language applications. AT&T Labs has built a mature phone markup language and platform that have been used to construct many different types of applications, ranging from call center-style services to consumer telephone services that use a visual Web site for customers to configure and administer their telephone features. . . As an XML-based definition with an HTML-like appearance, VXML will be easy to learn for experienced Web content programmers and amenable to easy processing by tools to support desktop development of VXML Web applications."

[May 18, 1999] VXML Forum - Update (from VoML Developer Newsletter, Volume 1, Number 2): "Motorola's VXML representatives have been meeting regularly with the other forum partners to help develop the VXML standard. They are making progress towards standardization. IBM has become a forum contributor since our last newsletter, and is actively working with Motorola, AT&T, and Lucent to develop the new language."

Principal References

Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Recommendation 16-March-2004.
VoiceXML XML Schema Definition. See the specification's normative Appendix O for other schemas defined in the VoiceXML namespace.
Voice Extensible Markup Language (VoiceXML) 2.1. W3C Working Draft 23 -March-2004. Edited by Matt Oshry, Tellme Networks (Editor-in-Chief); Paolo Baggia, Loquendo; Michael Bodell, Tellme Networks; David Burke, Voxpilot Ltd.; Daniel C. Burnett, Nuance Communications; Emily Candell, Comverse; Jim Ferrans, Motorola; Jeff Haynie, Vocalocity; Hakan Kilic, Scansoft; Jeff Kusnitz, IBM; Scott McGlashan, Hewlett-Packard; Rob Marchand, VoiceGenie; Michael Migdol, BeVocal, Inc.; Brad Porter, Tellme Networks; Ken Rehor, Vocalocity; Laura Ricotti, Loquendo. Version URL: http://www.w3.org/TR/2004/WD-voicexml21-20040323/. Latest version URL: http://www.w3.org/TR/voicexml21/.
VoiceXML Home Page
W3C Voice Browser Activity
Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Working Draft 24-April-2002.
VoiceXML Forum and W3C Collaboration. Memorandum of Understanding.
Interactive VoiceXML Tutorials
VoiceXML Review - e-zine
Presentations from the First Annual VoiceXML Forum Users Group Meeting

Articles, Papers, Reports, News

[June 15, 2005] "IBM And Speech Technology: An Interview With Bruce Morse." By Tracey Schelmetic. From TCMnet (June 15, 2005). "Where does IBM stand in the realm of speech technologies? CIS spoke with Bruce Morse, vice president of Contact Center Solutions for the IBM Software Group." [Morse:] "IBM's research organization has over 30 years' of experience in speech. It is highly skilled in voice user interface design, persona development and grammar, has more than 250 speech patents and over 100 researchers worldwide in speech labs, including China, Haifa, Tokyo, India and Almaden, working in more than 15 languages. Our work ranges from contact centers to mobile devices to automobiles. IBM is a leader in driving and incorporating speech standards such as VoiceXML, MRCP and W3C. We work with companies of all sizes. IBM was the first to deploy natural language understanding in an automated contact center. For two consecutive years, JD Power and Associates surveys rating customer satisfaction with in-car navigation systems found the top cars were from Honda and Acura, which use IBM's Embedded ViaVoice speech recognition technology. Our contact center customers have found our speech solutions improve call retention rates by six to 10 percent, cutting call times by 10 percent and decreasing costs by up to 90 percent compared to assisted services. IBM is also helping the large community of developers, ISVs and customers deploy and manage speech enablement. We have made significant contributions to the speech industry, through open standards work on VoiceXML, CCXML and MRCP, as well as to the Eclipse Foundation, including our recently announced contributions of VoiceXML and CCXML editors. In addition, we recently announced our contribution to the Apache Foundation of the Reusable Dialog Components (RDC) Framework... There are two million to three million J2EE developers in the marketplace, and our tooling and open source strategy has been to enable this highly skilled group to expand its reach into speech enablement. By creating plug-ins to the Eclipse framework, we help developers leverage their existing skills in Web development to extend to speech. We are contributing to the speech industry's efforts in order to shorten development time and decrease complexity through our commitment to open standards such as VoiceXML, CCXML, MRCP, xHTML and X+V. In addition, we have donated approximately 20 VoiceXML Reusable Dialog Components (RDCs) to the open-source community through IBM's Alphaworks..."
[April 2005] "Opera 8 Ships One Million Browsers with X+V Multimodal Technology." By Igor Jablokov (VoiceXML Forum Director; IBM). From VoiceXML Review Volume 5, Issue 2 (March/April 2005). "Opera Software ASA recently announced that version 8.0 of its browser received over one million downloads within four days of release. The Norwegian software vendor has created a fast and standards compliant Web experience. While this news is certainly commendable for any product introduction, rivaling even Mozilla's Firefox, it is also a milestone for the multimodal and voice standards community. Opera has included a feature that could usher in an age of human-computer interaction predicted long ago by many a science fiction writer. The Windows version of this browser now has an option that enables voice interaction. This functionality is provided by the IBM Multimodal Runtime Environment, which connects the Opera Browser to IBM Embedded ViaVoice (the same technology currently shipping in certain auto navigation systems). Not only does this enable users to interact with the entire browser interface using their voices (e.g., users can say 'browser go home' or 'browser fullscreen'), but they can also execute applications written in the XHTML+Voice (X+V for short) markup language. The X+V language permits developers to write and deploy multimodal Web applications, which allow users to interact through sight, sound and speech. This language was co-authored by IBM, Motorola and Opera and is under consideration by the W3C standards body. While modern day VoiceXML applications require specialized skills, X+V applications are different in that they more closely resemble standard Web applications. This breaks the current speech development paradigm and can allow the large body of Web developers to simply add voice interaction to existing Web applications... In any environment where 'hands-free' is not just a buzzword but a necessity, such as in healthcare, warehousing or enterprise applications, the value of this system becomes obvious. Doctors can ask for patient status by name or get alerted to changes in medical conditions using the natural sounding voice output (CTTS) that is included with the browser. In warehouses, companies can increase worker productivity by having the system communicate new orders to employees and leaving their hands free to fulfill the order. Also consider insurance adjusters speaking into complex forms and recording accident information while focused on investigating a scene. Opera 8 for Windows offers users a gateway into the multimodal experience. IBM looks forward to developers' creativity in leveraging these standards-based technologies to augment existing Web applications for increased end user productivity..."
[March 23, 2004] Opera Multimodal Desktop Browser Supports XHTML+Voice (X+V) Specification. An announcement from Opera Software at the AVIOS SpeechTEK International Exposition and Educational Conference describes the upcoming release of a multimodal desktop browser based on the XHTML+Voice (X+V) specification. In 2001, IBM, Motorola, and Opera submitted the XHTML+Voice Profile 1.0 to W3C for standards work. The most recent version of XHTML+Voice Profile 1.2 is managed by the VoiceXML Forum, and "brings spoken interaction to standard web content by integrating the mature XHTML and XML-Events technologies with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific DOM events, thereby reusing the event model familiar to web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content." The Opera multimodal browser project builds upon an ongoing relationship between IBM and Opera: the new release incorporates IBM's Embedded ViaVoice speech technology. IBM's ViaVoice speech technology supports a variety of real-time operating systems (RTOS) and microprocessors, powering mobile devices such as smart phones, handheld personal digital assistants (PDAs), and automobile components. "By leveraging IBM's voice libraries in this version of Opera, users can navigate, request information, and even fill in Web forms using speech and other forms of input in the same interaction." The new platform allows users to "interact with the content on the Web in a more natural way, combining speech with other forms of input and output; developers can also start to build multimodal content using the open standards-based X+V markup language, which unifies the visual and voice Web by using development skills a large population of programmers already have today."
[January 27, 2004] "Telco Punts $2.5m on Interactive-Voice XML." By Julian Bajkowski. In ComputerWorld Australia (January 20, 2004). "AAPT [Australian telecommunication company, owned by New Zealand's largest telecommunications company, Telecom New Zealand] will invest more than $2.5 million on a new, retail-customer interactive voice project that ports directly back into its mainframe billing and transaction systems in an effort to reduce call centre and administration costs. Based on a VeCommerce natural speech recognition engine and SOAP/XML interface, the solution will allow customers a voice interface directly into the telco's billing system to perform transactions - rather than waiting for a call centre employee to do the same thing. While AAPT says that inbound customers will still be able to speak to staff, under the new system, the voice-driven, self-serve regime is clearly designed to eliminate both customer-service bottlenecks and cut the call centre staff costs that go with them. Analysts say such systems may offer competitive advantages because of operational cost reductions through shifting inbound call centre functions from a human base to robotic base. Gartner's vice president of research for enterprise networks, Geoff Johnson, confirmed that the uptake of voice-driven XML (or VXML) has been swift over the last 18 months, largely led by investments in VoIP infrastructure and voice engine and application improvements. 'It's the diplomacy and sophistication along with fluency and fault tolerance that is driving this. It's the personal productivity [to the customer] that makes this attractive,' Johnson said Despite obvious pay offs, Johnson warned risks still existed in deploying voice-driven XML systems, noting some cultures (like Japan) simply do not tolerate non-human interfaces. Johnson said rollouts can come unstuck if enterprises attempted to port 'too many complex functions' to such systems..."
[September 19, 2003] SnowShore Networks Develops Royalty-Free Media Server Control Markup Language (MSCML). SnowShore Networks has officially announced royalty free licensing terms for implementing technology in the Media Server Control Markup Language (MSCML). MSCML is an XML-based protocol used in conjunction with the Session Initiation Protocol (SIP) to enable the delivery of advanced multimedia conferencing services over IP networks. The protocol was "submitted to the IETF as an Internet Draft in 2002 after a rigorous two year test and evaluation process. It is used to drive the delivery of IP enhanced conferencing to wireline, wireless and broadband networks worldwide. SnowShore also announced the successful deployment of MSCML in both trials and live network environments; it is is currently being used by a number of application developers, media server manufacturers, equipment vendors and service providers, including Z-Tel, IBM, Broadsoft, Bay Packets, Commetrex, Leapstone and Ubiquity Software. Industry watchers and vendors alike view MSCML as the essential protocol for call control between the media server and application server in the IP services architecture. SnowShore communicated to the IETF that it is offering Royalty Free licenses of its intellectual property necessary for implementing the MSCML standard. This inclusive policy provides IP application developers, infrastructure vendors and service providers with the opportunity to bring to market new IP enhanced conferencing and innovative services within the universal framework of SIP and MSCML."
[July 23, 2003] "HP Acquires PipeBeach to Strengthen Leadership in Growing VoiceXML Interactive Voice Market. Standards-based Products from PipeBeach Bolster HP OpenCall Portfolio and Enhance HP's Ability to Deliver Speech-based Solutions." - "HP today announced the acquisition of PipeBeach AB, a Stockholm, Sweden-based provider of speech-based products and technology that enable the delivery of interactive voice solutions. HP plans to integrate PipeBeach's VoiceXML-based products into its OpenCall suite of enhanced telecommunications software. This acquisition significantly enhances HP's ability to help telecom service providers, network equipment providers and independent software vendors simplify the creation and deployment of VoiceXML-based applications. HP will be better able to provide these customers with the flexibility and agility they need to get to market faster, reduce costs and improve customer loyalty. PipeBeach's VoiceXML-based products and technology enable users to speak into their mobile phones and devices to obtain Web-based information such as news, stock prices and e-mail, as well as conduct transactions, such as online banking. The information and options are conveyed to the user through speech - instead of text or images. 'HP is committed to the interactive voice market, and we intend to help our customers grow by providing them with a full spectrum of development tools, products and solutions,' said Jean-Rene Bouvier, vice president and general manager, HP OpenCall Business Unit. 'When we combine the advanced PipeBeach technologies - and track record in VoiceXML -- with HP's OpenCall portfolio and unique combination of telecom and IT expertise, we can establish HP as a global provider of interactive voice solutions.' HP intends to build on its strong presence in the interactive voice market by leveraging the PipeBeach acquisition to accelerate the development of open standards for voice and multimodal technologies in three primary ways: (1) By helping to deliver a complete VoiceXML development and deployment environment that will enable service providers and mobile operators to accelerate rollout of new, revenue-generating voice-enabled services (2) By accelerating the growth of voice portals, which can help reduce customer service costs. As users increasingly opt for voice browsing versus calling a live operator for basic information and assistance, the cost per call has dropped from $5 for human-assisted service to about 50 cents for voice-automated service.(3) (3) By helping increase customer loyalty through improved service. Voice portals can reduce wait times significantly because they are equipped to handle unpredictable spikes in call volume and allow users to interact directly with Web-based information and systems... HP OpenCall speechWeb interacts between the telephone network and standard Web servers that host VoiceXML applications. It enables mobile users to interact with VoiceXML Web pages, using advanced speech recognition and text-to-speech or spoken prompts. Today, speechWeb supports approximately 40 languages..."
[March 25, 2003] "VXML and VoIP Boost Customer Satisfaction. Quickly ID Your Callers Via These Two Technologies." By Veronika Megler (Certified Consulting IT Architect, Emerging and Competitive Markets, IBM). From IBM developerWorks, Wireless. March 2003. ['In this article on how to improve your customers' experience with your automated telephone-response system, Veronika Megler demonstrates how to combine VXML and VoIP with the information inherent in a telephone call to identify both the caller and the number being called. Use this lesson to improve telephone system efficiencies and bring back customers.'] "As both a consumer and a technologist, I am continually annoyed by the poor usability of many automated telephone-response systems -- especially because I know how little effort it would take to improve them. In Talk to my VoIP, I described an application that uses a Voice-over IP (VoIP) connection to access VoiceXML- (VXML) fronted back-end applications. I also showed how these technologies can provide flexible access to application information and deliver better telephone-based assistance to the average service-center caller. By using what you already know about your caller the moment you answer the call, you can expand these concepts to take personalization and usability to the next level... by accessing the target number, you can provide different voices for different choices. You can use these basic principles and existing technology to build increasingly usable voice-driven applications..."
[January 29, 2003] W3C Advances VoiceXML Version 2.0 to Candidate Recommendation Status. The W3C's Voice Extensible Markup Language (VoiceXML) Version 2.0 specification has been released as a Candidate Recommendation, together with an explicit call for implementation. "VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications." Comments on the CR specification are invited through 10-April-2003, when the VoiceXML specification is expected to enter the Proposed Recommendation phase. [Note 2003-02-25: Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C Candidate Recommendation 20-February-2003. Updated version provides "a correction to the schemas to fix problems found with some schema tools..."]
[January 29, 2003] "Dispute Could Silence VoiceXML." By Paul Festa. In ZDNet News (January 29, 2003). "The Web's leading standards group called on developers to implement its nearly finished specification for bringing voice interaction to Web sites and applications. But the intellectual property claims of a handful of contributors, including Philips Electronics and Rutgers University, threaten to keep the specification tied up in negotiations, the standards body warned. The World Wide Web Consortium (W3C) on Tuesday issued VoiceXML 2.0 as a candidate recommendation, the penultimate stage in the consortium's approval process. The job of VoiceXML -- part of the W3C's Voice Browser Activity -- is to let people interact with Web content and applications using natural and synthetic speech, other kinds of prerecorded audio, and touch-tone keypads. In addition to adding speech as a mode of interaction for everyday Web surfing, the W3C has its eye on other applications. These include the use of speech for the visually impaired and for people accessing the Web while driving. The group called VoiceXML a central part of its work on voice-computer interaction. 'The VoiceXML language is the cornerstone of what we call the W3C speech interface framework--a collection of interrelated languages that are used to create speech applications,' said Jim Larson, co-chair of the W3C's voice browser working group and manager of advanced human I/O (input/output) at Intel. 'Using these types of applications, the computer can ask questions and the user can respond using words and phrases or by touching the buttons on their touch-tone phone'... Other W3C specifications control individual pieces of the voice-browsing puzzle. The Speech Synthesis Markup Language (SSML), for example, describes how the computer pronounces words, with attention to voice inflection, volume and speed. The Speech Recognition Grammar Specification (SRGS), establishes what a user must say in response to a computer prompt. And the Semantic Interpretation for Speech Recognition (Semantic Interpretation) strips down text and translates it to a form that the computer can understand..."
[December 17, 2002] "Standardizing VoiceXML Generation Tools." By David L. Thomson. In VoiceXML Review (December 2002). "An area where we have an opportunity to make VoiceXML easier to use and more portable is in development and runtime tools. VoiceXML provides two significant advantages in authoring speech-enabled applications, when compared to previous methods. It allows a developer to build speech services with less effort and it allows applications written for one speech platform to run on another speech platform. These advantages are diminished, however, if software tools used to create and support VoiceXML code are inadequate or incompatible. The VoiceXML Tools Committee, under the direction of the VoiceXML Forum, has been working on methods for improving the quality and uniformity of tools as described below. To define a process for improvement, we must first outline an architecture that illustrates how tools are connected. Companies currently building tools include application developers, speech server suppliers, speech engine vendors, speech hosting service bureaus, stand-alone tool developers, and customers... Development tools and runtime software on the VoiceXML page server must use the same meta language. Since the meta language is generally unique to a given tool vendor, runtime software on the VoiceXML page server will only work with development tools from the same vendor... the VoiceXML Tools Committee is studying ways to standardize the meta language. Vendors would then use the standard meta language to represent parameters of the call flow, even if vendor tools otherwise provide different features. Two proposals under consideration are: (1) the XForms standard under development by the W3C and (2) an XML-based standard where styles sheets convert between formats used by different vendors. This rather ambitious goal will, if successful, improve the interoperability of development and runtime tools and make applications portable across vendors... Tools for developing VoiceXML-based speech applications are a critical factor in making VoiceXML easy to use. While VoiceXML itself may be well-defined, industry software for generating VoiceXML code lacks uniformity. We have launched an effort to define two standards that will help VoiceXML systems interoperate across different vendors. The effort will define how applications are represented and how runtime data is transported and stored. We hope that this effort will foster the creation of better tools and make developing VoiceXML services faster and easier..."
[December 17, 2002] "Enhancing VoiceXML Application Performance By Caching." By Dave Burke. In VoiceXML Review (December 2002). "The VoiceXML architectural model specifies a partitioning of application hosting, and application rendering. Specifically, the application is served from a Web Server and is typically created dynamically within the framework of an Application Server or equivalent. The VoiceXML Interpreter renders the resultant VoiceXML document, transmitted across a network by HTTP, into a series of instructions interpreted by the Implementation Platform. Implied in this model is a geographical distribution of the application hosting environment and the VoiceXML platform and thus the incursion of network latencies. An application might make many subsequent requests for new VoiceXML documents during its lifetime and thus these latencies may have considerable adverse effects on performance. In this article we will discuss how caching can be used to enhance the performance of VoiceXML applications. Caching is a strategy for storing temporary 'objects' (e.g., VoiceXML resources) local to the VoiceXML Interpreter that can be employed by the application developer for optimising these latencies. In what follows we will use the phrase 'origin server' to denote the application hosting environment, and 'user agent' to refer to the VoiceXML Interpreter and Implementation Platform... HTTP caching provides a powerful mechanism for improving performance of applications. A performant VoiceXML application that yields customer satisfaction will promote customer retention and also save money on deployment costs. Caching is often poorly understood and under-utilised on the Internet, yet can be effectively harnessed by observing some simple practices as outlined in this article..."
[December 03, 2002] IBM WebSphere Voice Application Access Supports VoiceXML. IBM has announced the WebSphere Voice Application Access middleware product designed to simplify "building and managing voice portals and to more easily extend web-based portals to voice. Leveraging the scalability, personalization, and authentication features of IBM's WebSphere Portal, it enables mobile workers to more easily access information from multiple voice applications -- using a single telephone number. This new offering includes IBM's WebSphere Voice Server as well as ready-to-use email, personal information management (PIM) functions, and sample portlets. It also supports VoiceXML and Java -- including development tools based on Eclipse, the open-source, vendor-neutral platform for writing software -- and uses open-standard programming languages to create voice-enabled applications that will interoperate with a range of web servers and databases. Building on the VoiceXML standards allows IBM WebSphere Voice Application Access to work with third party browsers and their associated underlying speech recognition and text-to-speech technologies. As the VoiceXML 2.0 specification nears final approval, IBM WebSphere Voice Application Access will move quickly to support it."
[December 03, 2002] "IBM Advances Pervasive Computing Strategy With New Software. Voice Portal Technology and Tools Extend IBM Momentum With Device Manufacturers, Service Providers and Enterprises." - "Building on a continuing wave of pervasive computing customer deployments and industry alliances, IBM today announced new software products and tools that make it easier for developers to build and manage voice portals -- as well as extend enterprise applications, such as mobile databases, to new devices. Today's announcement underscores IBM's ongoing commitment to help customers extend computing to new devices using an infrastructure built on a foundation of open, integrated and scalable technologies. IBM has built momentum helping enterprises extend capabilities to their mobile workforce, assisting service providers find new ways to decrease costs and increase revenue streams, and enabling device manufacturers to provide intelligent access to the enterprise. 'Pervasive computing plays a significant role in the on-demand era,' said Rodney Adkins, General Manager, IBM Pervasive Computing Division. 'Over the past year, we've been aggressively laying the foundation that gives people the flexibility to access and interact with information when they want it, where they want it and how they want it. Today's announcement adds to what is quickly becoming an extensive portfolio of technology, hardware, software and services that span the pervasive computing ecosystem.' Adding to its portfolio, IBM unveiled the new WebSphere Voice Application Access product: middleware that simplifies building and managing voice portals and more easily extends web-based portals to voice. Leveraging the scalability, personalization and authentication features of IBM's WebSphere Portal, it enables mobile workers to more easily access information from multiple voice applications -- using a single telephone number. This new offering includes IBM's WebSphere Voice Server as well as ready-to-use email, personal information management (PIM) functions, and sample portlets. It also supports VoiceXML and Java -- including development tools based on Eclipse, the open-source, vendor-neutral platform for writing software -- and uses open-standard programming languages to create voice-enabled applications that will interoperate with a range of web servers and databases. In keeping with IBM's strategy to provide solutions across multiple platforms, IBM will be working to make WebSphere Voice Application Access interoperable with offerings from third party VoiceXML vendors, such as Nuance and Cisco. In addition, IBM is also working with independent solutions vendors including V-Enable, Voxsurf and Viecore to extend their current solutions..." For a beta version of the WebSphere Voice Application Access product, see alphaWorks Voice Portal.
[December 02, 2002] "IBM Adds New WebSphere Tool for Voice Apps." By Stacy Cowley. In InfoWorld (December 02, 2002). "IBM will release later this month WebSphere Voice Application Access (WVAA), a tool for developers seeking to voice-enable corporate applications for mobile access. WVAA supports VoiceXML (Voice Extensible Markup Language) and Java, and includes sample portlets and preconfigured functions to speed development time for customized voice portals. Developers can use the technology to build voice interfaces for retrieving information from corporate databases and systems, such as stock quotes or customer information. In conjunction with other IBM development tools, WVAA can enable information to be requested with one device, such as a mobile phone, but delivered to another, like a handheld computer. The technology is particularly well-suited to mobile workers, said Sunil Soares, director of product management for IBM's Pervasive Computing Division. IBM has been working with one real estate company using the technology to allow agents to call and retrieve listings using keywords, he said... Nuance Communications, which competes with IBM on voice software, will support WVAA, as will infrastructure provider Cisco Systems. Voice software companies V-Enable and Voxsurf will also support the technology, as will services firm Viecore..." See also in eWEEK: "IBM Bolts Voice Support Onto Existing Applications," by Carmen Nobel.
[October 15, 2002] "VoiceXML, CCXML, and SALT." By Ian Moraes. In XML Journal Volume 3, Issue 9 (September 2002), pages 30-34. "There's been an industry shift from using proprietary approaches for developing speech-enabled applications to using strategies and architectures based on industry standards. The latter offer developers of speech software a number of advantages, such as application portability and the ability to leverage existing Web infrastructure, promote speech vendor interoperability, increase developer productivity (knowledge of speech vendor's low-level API and resource management is not required), and easily accommodate, for example, multimodal applications. Multimodal applications can overcome some of the limitations of a single mode application (GUI or voice), thereby enhancing a user's experience by allowing the user to interact using multiple modes (speech, pen, keyboard, etc.) in a session, depending on the user's context. VoiceXML, Call Control eXtensible Markup Language (CCXML), and Speech Application Language Tags (SALT) are emerging XML specifications from standards bodies and industry consortia that are directed at supporting telephony and speech-enabled applications. The purpose of this article is to present an overview of VoiceXML, CCXML, and SALT and their architectural roles in developing telephony as well as speech-enabled and multimodal applications... Note that SALT and VoiceXML can be used to develop dialog-based speech applications, but the two specifications have significant differences in how they deliver speech interfaces. Whereas VoiceXML has a built-in control flow algorithm, SALT doesn't. Further, SALT defines a smaller set of elements compared to VoiceXML. While developing and maintaining speech applications in two languages may be feasible, it's preferable for the industry to work toward a single language for developing speech-enabled interfaces as well as multimodal applications. This short discussion provides a brief introduction to VoiceXML, CCXML, and SALT for supporting speech-enabled interactive applications, call control, and multimodal applications and their important role in developing flexible and extensible standards-compliant architectures. This presentation of their main capabilities and limitations should help you determine the types of applications for which they could be used. The various languages expose speech application technology to a broader range of developers and foster more rapid development because they allow for the creation of applications without the need for expertise in a specific speech/telephony platform or media server. The three XML specifications offer application developers document portability in the sense that a VoiceXML, CCXML, or SALT document can be run on a different platform as long as the platform supports a compliant browser. These XML specifications are posing an exciting challenge for developers to create useful, usable, and portable speech-enabled applications that leverage the ubiquitous Web infrastructure..." [alt URL]
[October 09, 2002] "Progress in the VoiceXML Intellectual Property Licensing Debacle." By Jonathan Eisenzopf (The Ferrum Group). From VoiceXMLPlanet.com. October 2002. "In January of 2002 the World Wide Web Consortium released a rule that requires Web standards to be issued royalty free (RF). Some VoiceXML contributors hold intellectual property related to the VoiceXML standard. Some of those companies have already issued royalty free licenses, while others have agreed to reasonable and non-discriminatory (RAND) licensing terms... The fact that not all contributors have switched to a royalty free licensing model has been a thorn in the progress if the VoiceXML standard. I've voiced my concerns previously on this issue, specifically in SALT submission to W3C could impact the future of VoiceXML... Recently, IBM and Nokia changed their licensing terms from RAND to RF. At the VoiceXML Planet Conference & Expo on September 27 [2002], Ray Ozborne, Vice President of the IBM Pervasive Computing Division assured the audience at the end of his keynote speech that IBM would be releasing all intellectual property that related to the VoiceXML and XHTML+Voice specifications royalty free and encouraged the other participants to do the same... If VoiceXML is going to survive as a Web standard, then all contributors must license their IP royalty free, otherwise, the large investment that's been made will go down the drain. My hope is that the voice browser group at the W3C will either resolve these licensing issues in the next six months or jettison VoiceXML and replace it with SALT. Either way, I believe that it would be prudent for voice gateway vendors to be working on a SALT browser so that customers have the option down the road..."
[October 07, 2002] "Qwest Communications Launches Service To Help Customers Design Voice-Enabled Customer Service Applications." - "Qwest Communications International Inc. has "launched a new Web portal that provides business customers and systems integrators with the tools to develop customized interactive voice response (IVR) and speech recognition applications for their customer service functions. With these specially tailored applications, businesses and systems integrators can quickly and cost-effectively provide their customers with higher quality service via the telephone, e-mail or the Internet. The new portal -- called the Qwest Development Network -- is based on the voice extensible mark-up language (VXML), which is an open standard design code used to create voice-enabled Web applications. The development network provides tools for VXML application development, testing, online documentation and expert live technical support. Business customers and systems integrators can create an application from concept to prototype to trial to production at a reduced cost. Also, because the development network is based on an open standard concept, Qwest business customers and systems integrators can protect their development investment and easily migrate the application to Qwest Web Contact Center(sm) or other VXML platforms. Used by large and small companies alike, Qwest Web Contact Center is a Web-driven platform that integrates voice and data applications so businesses can provide world-class customer service over the phone or the Web. Qwest Web Contact Center enables businesses to implement a variety of functions including IVR, in-bound customer service, out-bound marketing, Web chats and other help desk applications. Qwest Web Contact Center supports both Speechworks and Nuance speech recognition technologies, and seamlessly integrates with Genesys and Cisco's ICM Computer Telephony Integration platforms for enhanced call management..."
[October 05, 2002] "Voice Biometrics and Application Security. Identification, Verification, and Classification." By Moshe Yudkowsky (WWW). In Dr Dobb's Journal [DDJ] (November 2002) #342, pages 16-22. Feature Article. ['Voice-based biometric security must support identification, verification, and classification. Moshe presents a verification system in which users' voice models are stored in a database on a VoiceXML server.'] "Voice biometrics are an excellent option for application security. Voice biometrics, which measure the user's voice, require only a microphone -- a robust piece of equipment as close as the nearest telephone. In this article, I prototype an application that uses a telephone call to verify identity using freely available voice biometric resources that have simple APIs. Furthermore, the prototype can be easily integrated with Internet-capable applications... Voice biometrics provide three different services: identification, verification, and classification. Speaker verification authenticates a claim of identity, similar to matching a person's face to the photo on their badge. Speaker identification selects the identity of a speaker out of a group of possible candidates, similar to finding a person's face in a group photograph. Speaker classification determines age, gender, and other characteristics. Here, I'll focus on speaker verification resources ('verifiers'). Older verifiers used simple voiceprints, which are essentially verbal passwords. During verification, the resource matches a user's current utterance against a stored voiceprint. Modern verifiers create a model of a user's voice and can match against any phrase the user utters. This is a terrific advantage. First, ordinary dialogue can be used for verification, so an explicit verification dialogue may be unnecessary. Second, applications can challenge users to speak random phrases, which make attacks with stolen speech extremely difficult... The prototype I present uses a telephony server to connect to the telephone network, a speech-technology server, and an application server to execute my code and control the other two servers... For the telephony server, speech-technology resource server, and application server, I use BeVocal's free developer hosting. BeVocal hosts VoiceXML-based applications. VoiceXML is an open specification from the W3C's 'voice browser' working group. XML-based VoiceXML lets you write scripts with dialogues that use spoken or DTMF input, and text-to-speech or prerecorded audio for output. My scripts reside on the Internet and are fetched by the VoiceXML server via HTTP. Since the VoiceXML specification does not define a voice biometrics API, I used BeVocal's extensions to VoiceXML. Another company that offers voice biometrics hosting is Voxeo; Voxeo uses a different API. Voxeo lets you send tokens through HTTP to initiate calls from the VoiceXML server to users, which is convenient for web-based applications -- not to mention more secure, as the application can easily restrict the calls to predefined telephone numbers. Both BeVocal and Voxeo offer free technical support If your application is voice-only and over the phone, adding speaker verification is straightforward. But any Internet-capable application can add VoiceXML... Biometrics in general, and speech technologies in particular, are imperfect and have a unique capacity for abuse: Voices, faces, and other characteristics can be scanned without knowledge or consent. Still, knowing 'something you are' is a powerful security tool when coupled with 'something you have' and 'something you know'." Note: The W3C Voice Browser Working Group was recently [25 September 2002] rechartered as a royalty free group operating under W3C's [then] Current Patent Practice.
[July 31, 2002] "VoiceXML Making Web Heard In Call Centers." By Ann Bednarz and Phil Hochmuth. In Network World (July 29, 2002). "Aspect Communications this week will announce call center software that essentially will enable users to navigate Web content via voice commands. The Aspect news comes on the heels of Avaya's announcement last week of interactive voice response (IVR) software that will make data contained in corporate directories and databases available to callers via spoken commands. At the heart of both efforts is support for the latest release of VoiceXML (VXML), Version 2.0. An extension to the XML document formatting standard, VXML streamlines development of voice-driven applications for retrieving Web content. While using voice commands to retrieve information is a routine IVR task, emerging tools support more complex, speech-driven activities, such as filling out forms or retrieving product information, all in a standards compliant rather than proprietary environment. In Aspect's case, customers will be able to use the same databases, application servers and business rules to process voice self-service interactions as they do to process Web self-service transactions. The firm is building the voice-activated service features into its existing software suite, Aspect IP Contact Suite. Avaya is adding VXML capabilities to Version 9.0 of its Avaya IVR server. Previous versions offered speech-recognition features, but 9.0 is the first to embed VXML support. Adoption of standards such as VXML is just one contributor to an overall trend to increase the sophistication of IVR products, making them less dependent on menus that bury information several layers deep and better able to handle queries phrased in natural language, says Martin Prunty, president of consulting firm Contact Center Professionals..." See details in the announcements from Aspect Communications and Avaya.
[July 31, 2002] "Aspect Communications Announces First IP-based Self-Service with VXML. Aspect Continues to Prove Customer Service Benefits of Converged Network by Enabling New Types of Voice Self-Service Applications." - Aspect Communications Corporation, a "leading provider of business communications solutions that help companies improve customer satisfaction, reduce operating costs, gather market intelligence and increase revenue, today announced the first IP-based self-service offering with VoiceXML (VXML) capabilities. The combination of IP-based self-service with VXML changes how customers use voice self-service. While customers are accustomed to voice self-service for routine tasks like checking balances and paying bills, IP and VXML open a wide range of more complex activities like filling out forms, placing detailed orders or completing any transaction that could also be handled on the Web. Aspect's offering will provide customers tremendous flexibility in how they interact with businesses. Using IP-based self-service that supports VXML, enterprises can simplify their infrastructure and reduce costs by automating many more customer service tasks than are automated today. Aspect's solution has several unique features that will let companies quickly and affordably develop newer, more useful self-service applications for customers. It offers a single development environment for creating rules that handle self- and live-service via voice, the Web and e-mail. Aspect's software is completely standards-based and integrates with VXML 2.0, allowing customers to perform the same self-service functions over the phone that they perform on the Web. Using natural language, customers can request information, fill out forms and place orders, and the enterprise uses the same databases, application servers and business rules as it does for the Web to process the voice self-service interactions... The Aspect IP Contact Suite enables fully integrated multichannel communications, including traditional voice technologies (PSTN), VoIP, e-mail and Web collaboration. Voice traffic travels over the same IP network as data and other communications such as e-mail and Web-based communications, versus over a separate circuit-switched network. Aspect's solution merges all communication into a unified queue and delivers it to the integrated desktops of service representatives. One network for voice and data centralizes administration, and the browser-based desktop applications empower the representatives to respond to contacts via all channels -- voice, e-mail, Web chat, assisted browsing and more -- on a single desktop with an easy-to-use interface..."
[June 24, 2002] "VoiceXML and the Future of SALT. [Voice Data Convergence.]" By Jonathan Eisenzopf. In Business Communications Review (May 2002), pages 54-59. ['Jonathan Eisenzopf is a member of the Ferrum Group, LLC, which provides consulting and training services to companies that are in the process of evaluating, selecting or implementing VoiceXML speech solutions.'] "The past year has been eventful for VoiceXML, the standard that application developers and service providers have been promoting for the delivery of Web-based content over voice networks. Many recent developments have been positive, as continued improvements in speech-recognition technology make voice-based interfaces more and more appealing. Established vendors are now validating VoiceXML, adding it to their products and creating new products around the technology. For many enterprises, this means that the next time there's a system upgrade, VoiceXML may be an option. For example, InterVoice-Brite customers soon will be able to add VoiceXML access to their IVR platform, which would provide callers with access to Web applications and enterprise databases... The introduction of SALT as an alternative to VoiceXML for multi-modal applications will present alternatives for customers who are not focusing exclusively on the telephone interface. However, VoiceXML is likely to be the dominant standard for next-generation IVR systems, at least until Microsoft and the SALT Forum members begin to offer product visions and complete solution sets..."
[April 25, 2002] W3C Voice Browser Working Group Issues VoiceXML Last Call Working Draft. W3C has released a Last Call Working Draft for Voice Extensible Markup Language (VoiceXML) Version 2.0. Pending receipt of positive feedback on this draft, the W3C Voice Browser Working Group plans to submit the specification for approval as a W3C Candidate Recommendation; comments may be sent for consideration until May 24, 2002. VoiceXML "is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications. The top-level element is <vxml>, which is mainly a container for dialogs. There are two types of dialogs: forms and menus. Forms present information and gather input; menus offer choices of what to do next... The dialog constructs of form, menu and link, and the mechanism (Form Interpretation Algorithm) by which they are interpreted are then introduced in Section 2. User input using DTMF and speech grammars is covered in Section 3, while Section 4 covers system output using speech synthesis and recorded audio. Mechanisms for manipulating dialog control flow, including variables, events, and executable elements, are explained in Section 5. Environment features such as parameters and properties as well as resource handling are specified in Section 6. The appendices provide additional information including the VoiceXML Schema, a detailed specification of the Form Interpretation Algorithm and timing, audio file formats, and statements relating to conformance, internationalization, accessibility and privacy." [Full context]
[April 05, 2002] "Is Speech Recognition Becoming Mainstream?" By Savitha Srinivasan and Eric Brown (IBM Almaden Research Center). In IEEE Computer Volume 35, Number 4 (April 2002), pages 38-41. IEEE Computer Society. ISSN: 0018-9162. This Guest Editors' Introduction provides an introduction to speech recognition in its two primary modes (using speech as spoken input, or as a data or knowledge source), an introduction to VoiceXML, and an overview of other articles in this IEEE Computer Special Issue on Speech Recogntion. ['Combining the Web's connectivity, wireless technology, and handheld devices with grammar-based speech recognition in a VoiceXML infrastructure may finally bring speech recognition to mass-market prominence.'] "... At the simplest level, speech-driven programs are characterized by the words or phrases you can say to a given application and how that application interprets them. An application's active vocabulary -- what it listens for -- determines what it understands. A speech recognition engine is language-independent in that the data it recognizes can include several domains. A domain consists of a vocabulary set, pronunciation models, and word usage models associated with a specific speech application. It also has an acoustic component reflected in the voice models the speech engine uses during recognition. These voice models can be either unique per speaker or speaker-independent. The domain-specific resources, such as the vocabulary, can vary dynamically during a given recognition session. A dictation application can transcribe spoken input directly into the document's text content, a transaction application can facilitate a dialog leading to a transaction, and a multimedia indexing application can generate words as index terms. In terms of application development, speech engines typically offer a combination of programmable APIs and tools to create and define vocabularies and pronunciations for the words they contain. A dictation or multimedia indexing application may use a predefined large vocabulary of 100,000 words or so, while a transactional application may use a smaller, task-specific vocabulary of a few hundred words. Although adequate for some applications, smaller vocabularies pose usability limitations by requiring strict enumeration of the phrases the system can recognize at any given state in the application. To overcome this limitation, transactional applications define speech grammars for specific tasks. These grammars provide an extension of the single words or simple phrases a vocabulary supports. They form a structured collection of words and phrases bound together by rules that define the set of speech streams the speech engine can recognize at a given time. For example, developers can define a grammar that permits flexible ways of speaking a date, a dollar amount, or a number. Prompts that cue users on what they can say next are an important aspect of defining and using grammars. It turns out that speech grammars are a critical component of enabling the Voice Web... The Voice Web -- triggered by the connectivity that wireless technology and mobile devices offer -- may be the most significant speech application yet. Developers originally included speech recognition technology in the device, but now they house this technology on the server side. This trend could lead to the development of powerful mass-market speech recognition applications such as (1) voice portals that provide instant voice access to news, traffic, weather, stocks, and other personal information; and (2) corporate information to streamline business processes within the enterprise. With the advent of VoiceXML, the Voice Web has become the newest paradigm for using technology to reinvent e-commerce. VoiceXML lets users make transactions via the telephone using normal speech without any special equipment. Thus, combining the Web's connectivity, wireless technology, and handheld devices with effective grammar-based speech recognition in a VoiceXML infrastructure may finally lead to the elusive mass market that speech recognition developers have chased for decades."
[February 21, 2002] See: W3C Publishes Specification for Voice Browser Call Control (CCXML). The W3C Voice Browser Working Group has released a first public Working Draft specification for Voice Browser Call Control: CCXML Version 1.0. The CCXML specification, based upon CCXML 1.0 submitted in April 2001, "describes markup for designed to provide telephony call control support for VoiceXML or other dialog systems. CCXML has been designed to complement and integrate with a VoiceXML system." The draft thus contains many references to VoiceXML's capabilities and limitations, together with details on how VoiceXML and CCXML can be integrated. However, the two languages are separate and are not required in an implementation of either language. For example CCXML could be integrated with a more traditional IVR system and VoiceXML could be integrated with some other call control system... Properly adding advanced telephony features to VoiceXML [through CCXML] entails adding not just a new telephone model, but new call management and event processing, as well... events from telephony networks or external networked entities are non-transactional in nature; they can occur at any time, regardless of the current state of VoiceXML interpretation. These events could demand immediate attention. We could either abandon VoiceXML's admirably simple single-threaded programming model, or delay event-servicing until the VoiceXML program explicitly asked to handle such events. Instead of making either of these bad choices, we instead move all call control functions out of VoiceXML into an accompanying CCXML program. VoiceXML can thus focus on being effective for voice dialogs, while CCXML tackles the very different problems..." [Full context]
[October 23, 2001] W3C Working Draft for Voice Extensible Markup Language (VoiceXML) Version 2.0. W3C has announced the first release of a public working draft for Voice Extensible Markup Language (VoiceXML) Version 2.0, along with a joint statement on collaborative effort between W3C and the VoiceXML Forum. The new draft is part of the W3C Voice Browser Activity and forms part of the proposals for the W3C Speech Interface Framework. The WD "specifies VoiceXML (Voice Extensible Markup Language) which is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications. VoiceXML is a markup language that: (1) Minimizes client/server interactions by specifying multiple interactions per document. (2) Shields application authors from low-level, and platform-specific details. (3) Separates user interaction code [in VoiceXML] from service logic [CGI scripts]. (4) Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers. (5) Is easy to use for simple interactions, and yet provides language features to support complex dialogs." According to a Memorandum of Understanding describing collaboration between the VoiceXML Forum and W3C, "VoiceXML Forum and the W3C have determined that it is in the best interests of the respective organizations and the public that they work together to further develop a dialog markup language... VoiceXML Forum will file an express abandonment of [certain relevant ] U.S. trademark applications, and [during the five-year period] the VoiceXML Forum agrees that the W3C will have sole control of the definition and evolution of the dialog markup language based on the VoiceXML 1.0 that is under development by the W3C Voice Browser Working Group." [Full context]
[February 14, 2002] "Updating Your System. Is VoiceXML Right for Your Customer Service Strategy?" [Critical Decisions]. By Jonathan Eisenzopf. In New Architect: Internet Strategies for Technology Leaders Volume 7, Issue 3 (March 2002), pages 20-21. ISSN: 1537-9000. "VoiceXML is based on technology that has been used in IVR systems for years and deployed in many Fortune 500 companies. VoiceXML is simply a thin veneer that abstracts the low-level APIs used to develop IVR applications. Voice dialogs are specified by static (or dynamic) XML documents that contain sets of recorded or synthesized prompts and speech recognition grammars. These XML documents are converted by a VoiceXML gateway into low-level commands that interact with the digital signal processors (DSP) and telephony boards in a VoiceXML gateway. It's unlikely that VoiceXML will bring the Web to the phone, however. Despite the hype, VoiceXML isn't well suited as a general-purpose interface for providing telephone access to the Web. Instead, the two areas where it can provide immediate and compelling benefits are customer service and order entry... The airline industry has used IVR systems to provide flight arrival and departure information for some time. This has dramatically reduced costs by eliminating the need for live operators and shortening the average length of each call. However, customers can be frustrated by touch tone IVR systems, and will often press zero in an attempt to reach a live representative. VoiceXML-based IVRs are a better alternative to such systems because they offer speech recognition and text-to-speech capabilities. For example, Amtrak's IVR application lets callers speak to the system, rather than navigating through multiple menus. Before updating its IVR system to use speech recognition, roughly 70 percent of customers using the system would exit to speak with an operator. After the speech recognition technology was added, Amtrak reports that the exit rate was reduced to 30 percent... Although VoiceXML hasn't been widely adopted yet, the fact that technology vendors are taking an interest in the standard is reassuring. With companies like Oracle, HP, Motorola, and IBM jumping on the VoiceXML bandwagon, it's likely that you'll have access to VoiceXML-capable tools the next time you upgrade your application servers and Web development software... Several companies are already working to improve VoiceXML systems to address these issues. As with most technologies, once VoiceXML's appeal broadens and the benefits of deploying IVR solutions as a compliment to online e-business applications become more evident, the rate of adoption will increase. If you currently handle order entry and customer support with a combination of online and telephone support, now may be the time to consider VoiceXML as a way to reduce costs and realize greater return on your existing software investments..." Note: New Architect was formerly WebTechniques.
[February 02, 2002] "Speech Vendors Shout for Standards." By Ephraim Schwartz. In InfoWorld (February 01, 2002). "The battle for speech technology standards is set to escalate next week when a collection of industry leaders submits to the World Wide Web Consortium (W3C) a proposed framework for delivering combined graphics and speech on handheld devices. The VoiceXML Forum, headed by IBM, Nuance, Oracle, and Lucent will announce a proposal for a multimodal technology standard at the Telephony Voice User Interface Conference, in Scottsdale, Arizona. Meanwhile, Microsoft will counter with its own news, using the same conference to announce the addition of another major speech vendor to its SALT (Speech Application Language Tags) Forum. The as yet unnamed vendor intends to rewrite its components to work with Microsoft's speech platform. The announcement will follow the addition of 18 new members to the SALT Forum, a proposed alternative to VXML's multimodal solution. New members of the SALT Forum include Compaq and Siemens Enterprise Networks. Founding members include Cisco, Comverse, Intel, Microsoft, Philips, and SpeechWorks... Most mainstream speech developers are currently creating Voice XML speech applications built on Java and the J2EE (Java 2 Enterprise Edition) environment, and running on BEA, IBM, Oracle, and Sun application servers. This week General Magic and InterVoice-Brite announced a partnership to develop Interactive Voice Recognition (IVR) enterprise solutions for 'J2EE environments,' using General Magic's VXML technology. Until recently Microsoft offered only a simple set of SAPI (speech APIs). Now through acquisition and internal development it has its own powerful speech engine which it is giving away to developers royalty free, said Peter Mcgregor, an independent software vendor creating speech products. Microsoft redeveloped SAPI in Version 5.1 to run on its new speech engine, while simultaneously proposing SALT as an alternative to VXML. Wrapping it all up in a marketing context, Microsoft's Mastan called the company's collection of speech technologies a 'platform,' a term previously not used... The issue over which specification of SALT, not due to be released until sometime later this year, or VXML, whose Version 2 is now out for review, is better is an argument that can only be determined by developers. Each side claims the other's specifications are deficient... IBM's William S. 'Ozzie' Osborne, general manager of IBM Voice Systems in Somers, N.Y.: 'I hope that we get to one standard. Multiple standards fragment the market place and create a diversion. I would like to see us get to a standard that is industry wide and not proprietary. What we are proposing to the W3C, using VXML for speech and x-HTML for graphics in a single program, is cheaper and easier than SALT without having to have the industry redo everything they have done'... Note the 2002-01-31 announcement: "The SALT Forum Welcomes Additional Technology Leaders as Contributors. New Members Add Extensive Expertise in All Aspects of Multimodal and Telephony Application Development and Deployment."
[March 08, 2002] "A SIP Interface to VoiceXML Dialog Servers." By Jonathan Rosenberg, Peter Mataga, and David Ladd (dynamicsoft). Internet Engineering Task Force, Internet Draft. Reference: draft-rosenberg-sip-vxml-00.txt. July 13, 2001; expires: February 2002. "VoiceXML is an XML based scripting language for describing voice dialogs. VoiceXML interpreters run within an interpreter context that, among other tasks, provides a call control interface for accessing the interpreter. It is very natural to provide a VoIP-based interpreter context that uses SIP and RTP to communicate with the outside world. In this document, we provide detailed specifications for a SIP/RTP based interpreter context... It is very natural to provide a VoiceXML interpeter context based purely on IP. Specifically, based on VoIP using SIP and RTP, along with HTTP for document access. An incoming VoIP call triggers the execution of the script, fetched from a server using HTTP. The incoming RTP stream for the call is passed to the interpeter for processing, and speech generated by the interpreter is sent over RTP to the called party. We call a pure IP-based VoiceXML system an "IP dialog server", or just "dialog server". Dialog servers are a key part of the application story for SIP-based networks, as described in the SIP application component architecture. That document describes SIP-based dialog servers, and provides a high level overview of how the SIP interface works. This document provides a stand-alone, self-contained, more thorough description of a SIP-based VoIP VoiceXML interpreter context..." [cache]
[January 02, 2002] "What's New in VoiceXML 2.0." By Jim A. Larson. In VoiceXML Review Volume 1, Issue 11 (December 2001). "So what's new with VoiceXML 2.0? Plenty. What was a single language, VoiceXML 1.0, has been extended into several related markup languages, each providing a useful facility for developing web-based speech applications. These facilities are organized into the W3C Speech Interface Framework... The VoiceXML 2.0 supports four I/O modes: speech recognition and DTMF as input with synthesized speech and prerecorded speech as output. VoiceXML 2.0 supports system-directed speech dialogs where the system prompts the user for responses, makes sense of the input, and determines what to do next. VoiceXML 2.0 also supports mixed initiative speech dialogs. In addition, VoiceXML 2.0 also supports task switching and the handling of events, such as recognition errors, incomplete information entered by the user, timeouts, barge-in, and developer-defined events. Barge-in allows users to speak while the browser is speaking. The VoiceXML 2.0 is modeled after VoiceXML 1.0 designed by the VoiceXML Forum, whose founding members are AT&T, IBM, Lucent, and Motorola. VoiceXML 2.0 contains clarifications and minor enhancements to VoiceXML 1.0. VoiceXML also contains a new <log> tag for use in debugging and application evaluation... The W3C Voice Browser Working Group has extended VoiceXML 1.0 to form VoiceXML 2.0 plus several new markup languages, including speech recognition grammar, semantic attachment, and speech synthesis. The speech recognition and speech synthesis markup languages were designed to be used in conjunction with VoiceXML 2.0, as well as with non-VoiceXML applications. The speech community is invited to review and comment on working drafts of these languages."
[January 02, 2002] "VoiceXML 2.0 from the Inside." By Dr. Scott McGlashan. In VoiceXML Review Volume 1, Issue 11 (December 2001). "With the publication in October 2001 of VoiceXML 2.0 as a W3C Working Draft, VoiceXML is finally on its way to become a W3C standard. VoiceXML 2.0 is based on VoiceXML 1.0, which was submitted to the W3C Voice Browser Working Group by the VoiceXML Forum in May 2000. In this article, we examine some of the key changes in the first public working draft of VoiceXML 2.0 as compared to the VoiceXML 1.0 specification... Since the founding of the Voice Browser Working Group in March 1999, the group had the mission of developing a suite of standards related to speech and dialog. These standards formed the W3C Speech Interface Framework and cover markup languages for speech synthesis, speech recognition, natural language and dialog, amongst others. Since the VoiceXML Forum had made clear its intention to develop VoiceXML 1.0 and submit it to the Voice Browser Working Group, the dialog team focused its efforts on specifying requirements for a W3C dialog markup language and providing detailed technical feedback to the Forum as VoiceXML 1.0 evolved. With the submission of VoiceXML 1.0, the dialog team began its work in earnest of developing VoiceXML into a dialog markup language for the Speech Interface Framework. A change request process was established in order to manage requests for changes in VoiceXML 2.0 from members of the Working Group and other interested parties; changes could include editorial, clarification, functional enhancements, all the way up to complete redesign of the language. Rather than try to incorporate every possible change into VoiceXML 2.0, we decided to limit the scope of changes..."
[January 02, 2002] "First Words: So What's New?" By Rob Marchand. In VoiceXML Review Volume 1, Issue 11 (December 2001). ['This month's column touches on some of the things that you can look for in VoiceXML 2.0, and how it impacts some of the VoiceXML tricks and tips he's introduced throughout the year.'] "The VoiceXML Forum founders (AT&T, Motorola, IBM, and Lucent) prepared the original VoiceXML 1.0 Specification. It was then passed over to the W3C Voice Browser Working Group to be evolved into VoiceXML 2.0. It was released as a public working draft on October 23rd of this year, with public comments being accepted until November 23rd . The process moving forward will include (possibly) additional working drafts, followed by a 'Last Call' working draft. Finally, a 'candidate recommendation' will be made available for final comment, followed by the formalization of VoiceXML 2.0 as a W3C Recommendation. There is still substantial work to go through in moving VoiceXML 2.0 through the W3C process, but the specification itself should now include most substantive changes and features that will be considered for the 2.0 recommendation. The current working draft of VoiceXML 2.0 improves on the VoiceXML 1.0 specification in a number of ways. If you're developing on any of the publicly available developer systems, you probably already have access to these features, or at least some of them..."
[November 01, 2001] "VoiceXML Developer Series: A Tour Through VoiceXML, Part V." By Jonathan Eisenzopf. From VoiceXMLPlanet. November 01, 2001. ['In the previous edition of the VoiceXML Developer, we created a full VoiceXML application using form fields, a subdialog, and internal grammars. In this edition, we will learn more about one of the most important, but rarely covered components of a VoiceXML application, grammars.'] "Now that we've built a few applications, it's time to talk about grammars. Grammars tell the speech recognition software the combinations of words and DTMF tones that it should be listening for. Grammars intentionally limit what the ASR engine will recognize. The method of recognizing speech without the burden of grammars is called "continuous speech recognition" or CSR. IBM's Via Voice is an example of a product that uses CSR technology to allow a user to dictate text to compose an email or dictate a document. While CSR technologies have improved, they're not accurate enough to use without the user training the system to recognize their voice. Also, the success rate of recognition in noisy environments, such as over a cell phone or in a crowded shopping mall, is reduced greatly. Pre-defining the scope of words and phrases that the ASR engine should be listening for can increase the recognition rate to well over 90%, even in noisy environments. The VoiceXML 1.0 standard uses grammars to recognize spoken and DTMF input. It doesn't, however, define the grammar format. This is changing however with the release of VoiceXML 2, which defines a standard XML-based and alternate BNF notation grammar format. Still, the fact that VoiceXML relies heavily on grammars means that we must create or reuse grammars each time we want to gather input from the user. In fact, the time required to create, maintain, and tune VoiceXML grammars will likely be several magnitudes greater than the time you will take to develop the VoiceXML interfaces. Not having high-quality and complete grammars means that the user will spend too much of their time repeating themselves. A system that cannot recognize input the first time, every time, will alienate users and cause them to abandon the system altogether. Therefore, we are going to spend a bit of time talking about grammars for VoiceXML 1.0 (and now VoiceXML 2) in the coming articles so that you will be armed with the knowledge you need to create successful VoiceXML applications. The first grammar format we are going to learn is GSL, which is used by the Nuance line of products... I want to reflect on some of the things that I've learned as I've been developing new VoiceXML applications over the past year as it relates to grammars. First, grammars can be difficult to develop and time consuming to tune. And things don't stop there. You will probably need to tune the dictionary that the system is using to include alternate word pronunciations as you begin to collect data on where the ASR application is failing. It's very important that the application will be able to recognize what the user is saying most of the time. Because DTMF input is almost 100% accurate, it should be preferred over speech for things like phone and credit card numbers. However, some voice interface designers recommend that you don't mix a touch-tone input with speech input. I'd say it's better than the alternative if you are having problems recognizing number sequences. Remember, speech recognition has gotten much better, but it still takes a great deal of work and care to reach the high 90s percentile success rates that vendors often mention. Thanks again for joining us for another edition of the VoiceXML Developer. In the next edition of the VoiceXML Developer, we will continue our exploration into grammars as part of our tour of the VoiceXML 1.0 specification..."
[September 10, 2001] "VoiceXML and the Voice/Web Environment. Visual Programming Tools for Telephone Application Development." By Lee Anne Phillips. In Dr Dobb's Journal [DDJ] (October 2001) #329, pages 91-96. Programmer's Toolchest. "While the Internet is making inroads into the public switched-telephone network, XML protocols such as VoiceXML are providing access to a set of tools that address the entire range of web applications..." The article provides an overview of GUI tools for creating VoiceXML applications, and reviews two: Visual Designer 2.0 from Voxeo, and Covigo Studio. [Covigo Studio "provides a visual programming environment that helps you to rapidly develop integrated mobile data and voice applications. Based on a user-centric process modeling approach, Studio separates user-interaction workflow from presentation design and data source integration. It allows you to build mobile applications from the ground-up or as extensions to existing applications, and to constantly optimize their applications to meet changing user, industry and business needs. The visual modeling approach provides multiple ways to integrate with existing enterprise applications at the presentation layer, business logic layer, or data layer levels. The product integrates with existing IT systems - including complex enterprise business processes encapsulated in systems used for customer relationship management (CRM), enterprise resource planning (ERP), and supply chain automation (SCM). This includes integrating with such technologies as HTML, JSPs, EJBs, JDBC, XML, and packaged application APIs..." The Visual Designer 2.0 from Voxeo is available at no cost. One can use the designer "to visually design phone applications and it will automatically generate the VoiceXML or CallXML markup for you. This allows a voice application developer to focus on important issues like usability and functionality, without having to worry about syntax. Voxeo Designer 2.0 is the first visual phone markup design tool to fully support round-trip development -- any CallXML or Voice XML application may be opened in the Designer tool, updated graphically (or by editing the XML directly) and re-deployed for use. Features include: Visual application design using flowcharts; Full round-trip, bi-directional development; Element/Attribute syntax validation; FTP and HTTP support for file read and write; Full CallXML Tag Support; Full VoiceXML 1.0 Tag support; 100% Pure-Java IDE, runs on any Java Virtual Machine ..."] Additional resources with Lee Anne's article include listings and source code.
[August 24, 2001] "Speech Technology Grows Up. Speech applications can save money and the technology is moving into advanced applications." By Kathleen Ohlson. In Network World Fusion (August 20, 2001). "... In the coming months, voice technology will only get better, observers say. Industry experts and vendors expect support for VoiceXML, a specification that would enable speech-based applications and online information to become phone and voice accessible, and the infusion of speech recognition in wireless devices, such as cell phones and PDAs, to flourish. Thrifty has deployed SpeechWorks' interactive speech recognition software to handle customer requests for car rental quotes. Customers who call Thrifty's reservation number are prompted to give information regarding dates, times, car size, city and airport, and then receive reservation information. When a customer wants to book a reservation, he is transferred to a sales agent. The agent receives the calls and information containing the customer's requests on his computer screen. The car agency has handled more than 200,000 calls so far through the system, and it plans to push over more by summer's end. Thrifty receives 4 million calls per year with 30% to 40% coming from customers checking rates and availability, according to DuPont, staff vice president of reservations... In addition to Thrifty, United Airlines and T. Rowe Price are two companies that have recently implemented interactive speech systems. Speech technology is also expected to penetrate in areas such as inventory tracking and salesforce automation, according to industry experts. For example, salespeople could prompt for information regarding their contacts and calendars through a phone...One of the main drivers of speech technology in the coming months will be the adoption of VoiceXML, which basically outlines a common way for speech applications to be programmed. With the adoption of VoiceXML, businesses would only need to build an application once and then could run it on multiple vendor platforms. VoiceXML is the brainchild of IBM, AT&T, Lucent and Motorola, and is currently supported by more than 500 companies, including Nokia, Sprint PCS, Nuance and SpeechWorks. SpeechWorks recently rolled out its VoiceXML-based speech recognition engine OpenSpeech Recognizer 1.0; Nuance, Lucent, IBM and others have implemented VoiceXML into their products..."
[August 24, 2001] "Voice XML Version 2 Stalled Over IP Issue." By Ephraim Schwartz. In InfoWorld August 24, 2001. "Version 2 of the Voice XML markup language is all but signed and sealed, but not quite delivered due to a snag in nailing down IP (intellectual property) rights. According to an industry analyst familiar with the issues discussed at the Voice XML Forum, all the specifications have been agreed upon, but there is a concern still that a future developer using VXML could be sued by a member of the Forum for infringement of IP rights... One solution may be that companies [currently 55 Forum members] might choose to provide license-free use or forego patent rights, Meisel added. All sources in the speech technology industry see VXML as a boon to the industry because it uses a standard language already familiar to Web developers. Version 2, expected to ship by the end of the year, is in its final development stages, according to the Forum chairman Bill Dykas... Up until now, developers creating speech applications used proprietary formats for writing speech grammars. A speech grammar is needed to map a wide range of responses into a narrower range, explained Dykas. For example, in a 'yes/no grammar' there may be a dozen ways for a caller to respond in the affirmative to a question including yeah, yes, okay, please, and alright which all can be mapped to Yes. Version 2 of VXML will define a common format so the program has to deal with only a single response. The second major addition to the standard -- the Voice XML Forum is working with the W3C standards body -- is the clarification of the call transfer tags... technology components, as for example in telephony: how to manipulate telephone voice mail and load balancing between mechanisms if a large number of calls come in simultaneously... Other areas include natural language understanding and multimodal interfaces for handhelds and cellular handsets. For example, in using a multimodal interface, a mobile worker may make a voice request to a database for customers that match a certain set of parameters, but the results will be displayed rather than spoken."
[August 2001] Early Adopter VoiceXML By Eve Astrid Andersson, Stephen Breitenbach, Tyler Burd, Nirmal Chidambaram, Paul Houle, Daniel Newsome, Xiaofei Tang, and Xiaolan Zhu. Wrox Press. August, 2001. ISBN: 1861005628. The book covers: (1) An overview of the development and deployment environments available; (2) VoiceXML 1.0 syntax tutorial; (3) Grammar use, including JSGF and Nuance GSL syntax; (4) Use of VoiceXML with XSLT, ASP, JSP, and PHP; (5) Nuance Speechobjects; (6) The future of VoiceXML technologies, including VoiceXML 2.0. [...] VoiceXML brings the power of Voice to the Web - the information we are used to accessing through the visual web interfaces of our PCs and mobile devices can now be accessed through speech alone. Building on the functionality already seen in IVR applications deployed by our banks and utility companies, the tag based syntax of VoiceXML will instantly be familiar to existing web developers, and applications can already be deployed using one of the many voice portals available. With the world's billion plus telephones, from antique black candlestick phones to the latest mobiles, there is a huge ready-made audience crying out for voice applications. The userbase encompases those on the move who require easy access to information wherever they are, and those who haven't the money or inclination to access the Internet through a PC. The book aims to give the reader an in-depth analysis of the current state of VoiceXML technology. The information will help you develop voice-enabled applications now, and make sure you are ready for future advances of this quickly changing arena." See the online Table of contents.
[September 05, 2001] VoiceXML in RELAX NG. 2001-09-05 or later. From Kohsuke KAWAGUCHI. "I have translated the VoiceXML 1.0 DTD into RELAX NG syntax. I originally wrote it in my short-hand syntax and then used my tool to convert it to full RELAX NG syntax. I've never tested it, so it may well contain several translation errors. All the files are available in one zip file..." [cache]
[August 01, 2001] IBM alphaWorks Releases Voice Toolkit. The XML development team at IBM alphaWorks labs has released a beta version of a 'Voice Toolkit' to assist in the creation of voice applications "in less time, using a VoiceXML application development environment. The Voice Toolkit features grammar and VoiceXML editors so that application developers do not need to know the internals of voice technology. The Voice Toolkit Beta includes: (1) An integrated development environment (IDE) - runs on the desktop and enables the multi-step process of creating speech applications; (2) A VoiceXML editor - provides content assistance and integrated pronunciation development; (3) A Grammar editor - enables syntax-checking and integrated pronunciation development for generating JSGF grammars for VoiceXML applications. The grammar editor includes grammar creation for SRCL/BNF grammars and it provides conversion capability between SRCL/BNF and JSGF; (4) A pronunciation builder - generates a pronunciation from spelling; and it lets you manually create pronunciations; (5) A basic audio recorder - allows the creation of audio files from spoken text and the playing of previously-recorded audio files; (6) VoiceXML Reusable Dialog Components - pre-written VoiceXML code for use as building blocks for application functions." [Full context]
[May 23, 2000] "VoiceXML Forum Founders Submit VoiceXML 1.0 Specification to W3C. Submission Marks Milestone on the Path to Voice-Enabled Internet." - "The VoiceXML Forum today announced that the World Wide Web Consortium (W3C) has acknowledged the submission of Version 1.0 of the VoiceXML specification. At its May 10-12 meetings in Paris, the W3C's Voice Browser Working Group agreed to adopt VoiceXML 1.0 as the basis for the development of a W3C dialog markup language. The Forum's founding members, AT&T, IBM, Lucent Technologies, and Motorola made the W3C submission. Acknowledgement by the W3C will help to accelerate and expand the reach of the Internet through voice-enabled Web content and services. The VoiceXML Forum will host the next meeting of the W3C Voice Browser Working Group in September 2000. 'As the W3C Voice Browser Working Group begins to define the speech interface framework that extends the Web to voice-based devices, we will use VoiceXML as a model for our dialog markup language. The W3C speech interface framework will include integrated markup languages for dialog, grammar, speech synthesis, natural language semantics, and multimodal dialogs, as well as a standard list of reusable dialogs,' said Jim Larson of the Intel Architecture Labs, who is Co-chair of the W3C Voice Browser Working Group..."
VoiceXML specification DTD. Posted 18-July-2000. Includes corrections. [cache]
VoiceXML DTD - From http://www.w3.org/TR/2000/NOTE-voicexml-20000505.
[August 13, 2001] "Creating VoiceXML Applications With Perl." By Kip Hampton. From XML.com. August 08, 2001. ['Kip Hampton shows how Perl and VoiceXML can work together.'] "VoiceXML is an XML-based language used to create Web content and services that can be accessed over the phone. Not just those nifty WAP-enabled 'Web phones', mind you, but the plain old clunky home models that you might use to order a pizza or talk to your Aunt Mable. While HTML presumes a graphical user interface to access information, VoiceXML presumes an audio interface where speech and keypad tones take the place of the screen, keyboard, and mouse. This month we will look at a few samples that demonstrate how to create dynamic voice applications using VoiceXML, Perl, and CGI. A rigorous introduction to VoiceXML and how it works is beyond the scope of this tutorial. For more complete introductions to VoiceXML's moving parts see Didier Martin's 'Hello, Voice World' or the VoiceXML Forum's FAQ... VoiceXML is much more than an alternative interface to the Web. It allows developers to extend their existing applications in new and useful ways, and it offers many unique opportunities for new development. As you may have guessed, though, that power and flexibility come with a hefty price tag: VoiceXML gateways (the hardware and software that connect the Web to the phone system, translate text to speech, interpret the VoiceXML markup, etc.) are not cheap. The good news is that many of prominent VoiceXML gateway providers offer free test and deployment environments to curious developers, so you can check out VoiceXML for yourself without breaking the bank."
[July 30, 2001] "XML Gives Voice to New Speech Applications." By Steve Chambers. In Network World [Fusion] Volume 18, Number 31 (July 30, 2001), page 37. "Speech technology is evolving to the point where an exchange of information between a person and a computer is becoming more like a real conversation. Many factors are responsible for this, ranging from an exponential increase in computing power to a general advancement of basic speech technology and user interface design. Speech-based applications deployed to date have been based on code created by a few speech software vendors. VoiceXML will likely change this landscape by virtue of its promised vendor independence in creating speech applications. VoiceXML is the emerging standard for speech-enabled applications. It defines how a dialog is constructed and executed between a caller and a computer running speech recognition and/or text-to-speech software. VoiceXML incorporates the flexibility to create speech-enabled Web-based content or to build telephony-based speech recognition call center applications. . . Vocabularies and grammars are the key components that define the input to a speech-enabled page. The vocabulary consists of the words to be recognized by the speech recognition engine. For example, a vocabulary for a flight information system might consist of city names and travel-related words such as "leaving" and "fly." Grammars provide the structure to identify meaningful phrases. A vocabulary and grammar are combined within a speech-enabled application to define speech recognition within a reasonable range of efficiency for both the caller and the speech recognition processor. Designing a speech application includes presenting data for delivery over the phone, constructing a call flow and enabling prompts and grammars. VoiceXML provides a common set of rules as a flexible foundation, but it's up to the designer to create the appropriate flow and personality for a speech system..."
[July 23, 2001] "New technology gives Web a voice." By Wylie Wong. In CNET News.com July 19, 2001. "A budding standard, the brainchild of tech giants AT&T, IBM, Lucent Technologies and Motorola, is fueling new software that allows people to use voice commands via their phones -- either cell or land-based -- to browse the Web. Users of the technology can check e-mail, make reservations and perform other tasks simply by speaking commands. The technology, called VoiceXML, is now winding its way through the World Wide Web Consortium Internet standards body, which is reviewing the specification and could make it a formal standard by year's end. Proponents of VoiceXML say standardization is crucial for the market for Web voice access software and services to take off. The standard gives software and hardware makers, as well as service providers and other companies using the technology, a common way to build software to offer Web information and services over the phone. . . Even though the VoiceXML specification hasn't been finalized, tech companies and telecommunications service providers alike have flocked to support the technology and are already offering new software and services that tie the telephone to the Internet. The technology has gained the support of nearly 500 companies, including IBM, networking giant Cisco Systems, database software maker Oracle and stock brokerage firm Charles Schwab..."
[May 23, 2001] "The Power Of Voice." By Ana Orubeondo (Test Center Senior Analyst, Wireless and Mobile Technologies). In InfoWorld (May 18, 2001). ['VoiceXML should connect your existing Web infrastructure, the Internet, and the standard telephone by providing a standard language for building voice applications. E-business managers who plan voice portal strategies will need to decide whether to build the portals themselves or turn to a growing number of voice ASPs. Be careful when selecting rapidly evolving voice portal technologies. Key improvements such as grammar authoring in Version 2.0 should iron out some of the shortcomings VoiceXML exhibits.'] VoiceXML is a standard language for building interfaces between voice-recognition software and Web content. Just as HTML defines the display and delivery of text and images on the Internet, VoiceXML translates any XML-tagged Web content into a format that speech-recognition software can deliver by phone. VoiceXML 1.0 is a specification of the VoiceXML Forum, an industry organization founded by AT&T, IBM, Lucent Technologies, and Motorola and consisting of more than 300 companies. With the backing and technology contributions of its four world-class founders and the support of leading Internet industry players, the VoiceXML Forum has made speech-enabled applications on the Internet a reality through its mission to develop and promote VoiceXML. With VoiceXML, users can create a new class of Web sites using audio interfaces, which are not really Web sites in the normal sense because they provide Internet access with a standard telephone. These applications make online information available to users who do not have access to a computer but do have access to a telephone. Voice applications are useful for highly mobile users who need hands-and eyes-free interaction with Web applications, possibly while driving or carrying luggage through a busy airport... Voice portals such as BeVocal, TellMe, and Shoptalk are already providing voice access to stock quotes, movie and restaurant listings, and daily news. The best-suited applications for VoiceXML are information retrieval, electronic commerce, personal services, and unified messaging. Several companies have already employed VoiceXML in information retrieval applications to great success. Hotels, car rental agencies, and airlines have implemented continuous voice access to allow customers to make or confirm reservations, buy tickets, find rates, get store hours and driving directions, and access loyalty programs. Voice automated services help reduce call-center costs and increase customer satisfaction... As the volume of information published using HTML grows and the range of Web services broadens, VoiceXML will become an increasingly attractive technology. VoiceXML increases the leverage under a company's Web investment by offering voice interpretation of HTML content." [altURL]
[March 09, 2001] "VoiceXML and the Voice-driven Internet." By David Houlding (The Technical Resource Connection). In Dr. Dobb's Journal Volume 26, Issue 4 (April 2001), pages 88-94. ['David Houlding examines the concept of voice portals, and shows how simple design patterns -- together with XML and XSL- can be used to deliver Internet content to web browsers and wireless devices.'] "Wireless data services are growing at a phenomenal rate, driven to a large extent by the popularity of the Internet services they are delivering. These wireless-enabled Internet services are generally accessible not only by standard web browsers, but also by some mix of web phones, two-way pagers, and wireless organizers. The adoption of these modes of Internet access is being accelerated by the effects of mainstream Internet usage maturing from an initial novelty/hype phase into a ubiquitous set of services we use as common tools in everyday life. In this mode of use, how information is presented is less important than being able to get to the particular information you require easily, when and where you need it... Voice portals leverage both the most natural form of communication -- speech -- and the most pervasive and familiar communications network -- the global telephone network. This network is accessible by either standard wired or mobile cellphones users already have, together with service plans, so no additional cost needs to be incurred for users to access Internet services via voice portals. This eliminates the expense barriers that are currently limiting the penetration of wireless services into the marketplace. Phones also permit eyes- and hands-free operation, enabling Internet service usage via voice portals in situations where wireless devices will not suffice. In this article, I'll discuss the concept of voice portals and the associated architecture. I'll then show how simple design patterns -- together with XML and XSL -- can be used to deliver Internet content and services cost effectively not only to web browsers and various wireless devices, but also to any telephone via VoiceXML (for more information on the VoiceXML Standard, see http://www.voicexml.org/). I'll then present an implementation of this architecture that uses software that is freely available on the Internet. Finally, I'll examine key business and technical issues associated with voice-driven applications. VoiceXML is a new standard with significant industry backing. It promises to create a level playing field on which voice portals may compete for outsourcing the hosting of voice applications. This will drive down cost and improve quality of service for both application providers and their customers. From the application providers standpoint, creating voice applications using VoiceXML has the advantage that content is portable across different voice portals, delivering flexibility with respect to choosing voice portals to host voice applications. Voice portals driven by VoiceXML provide a powerful complementary new mode of access that empowers users with more options regarding when, where, and how they consume Internet services. Using speech as the most natural form of communication, the existing familiar global telephone network as the most pervasive communications network, and enabling eyes- and hands-free operation, this new mode of access promises to further accelerate the growth and maturity of Internet services into a ubiquitous set of tools we use every day." Additional resources include listings and source code.
[April 19, 2001] "Introduction to the W3C Grammar Format." By Andrew Hunt. In VoiceXML Review Volume 1, Issue 4 (April 2001). ['The W3C Voice Browser Working has released a draft specification for a standard grammar format that promises to enhance the interoperability of VoiceXML Browsers and drive portability of VoiceXML applications. This article summarizes the key features of the specification and the application of the specification to VoiceXML application development.'] The W3C Speech Recognition Grammar Format specification embodies two equivalent languages.XML Form of the W3C Speech Recognition Grammar Format: Represents a grammar as an XML document with the logical structure of the grammar captured by XML elements. This format is ideal for computer-to-computer communication of grammars because widely available XML technology (parsers, XSLT, etc.) can be used to produce and accept the grammar format.Augmented BNF (ABNF) Form of the W3C Speech Recognition Grammar Format: The logical structure of the grammar is captured by a combination of traditional BNF (Backus-Naur Form) and a regular expression language. This format is familiar to many current speech application developers, is similar to the proprietary grammar formats of most current speech recognizers and is a more compact representation than XML. However, a special parser is required to accept this format. Grammars written in either format can be converted to the other format without loss of information (except formatting). The two formats co-exist because the Working Group found it important to support both computer-to-computer communication format and a more familiar human-readable format (but, as with all decisions reached by a committee, there is a spectrum of opinion on these matters)... The new W3C Speech Recognition Grammar Format is a powerful language for developing both simple grammars and natural language grammars for use in VoiceXML applications. The availability of a standard grammar format will increase the interoperability of VoiceXML applications by allowing each grammar to be authored once and reused across many VoiceXML browsers."
[April 19, 2001] "The Speech Synthesis Markup Langauage for the W3C VoiceXML Standard." By Mark R. Walker and Andrew Hunt. In VoiceXML Review Volume 1, Issue 4 (April 2001). ['Among the first in a series of the W3C's soon-to-be-released XML-based markup specifications is the speech synthesis text markup standard. This article summarizes the markup element design philosophy and includes descriptions of each of the speech synthesis markup elements.'] "A new set of XML-based markup standards developed for the purpose of enabling voice browsing of the Internet will begin emerging in 2001 from the Voice Browser Working Group, which was recently organized under the auspices of the W3C. Among the first in this series of soon-to-be-released specifications is the speech synthesis text markup standard. The Speech Synthesis Markup Language (SSML) Specification is largely based on the Java Speech Markup Language (JSML), but also incorporates elements and concepts from SABLE, a previously published text markup standard, and from VoiceXML, which itself is based on JSML and SABLE. SSML also includes new elements designed to optimize the capabilities of contemporary speech synthesis engines in the task of converting text into speech. This article summarizes the markup element design philosophy and includes descriptions of each of the speech synthesis markup elements. The Voice Browser Working Group has utilized the open processes of the W3C for the purpose of developing standards that enable access to the web using spoken interaction. The nearly completed SSML specification is part of a new set of markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to give authors of synthesizable content a standard way to control aspects of speech output such as pronunciation, volume, pitch and rate across different synthesis-capable platforms. It is anticipated that SSML will enable a large number of new applications simply because XML documents would be able to simultaneously support viewable and audio output forms. Email messages would potentially contain SSML elements automatically inserted by synthesis-enabled, mail editing tools that render themessages into speech when no text display was present. Web sites designed for sight-impaired users would likely acquire a standard form, and would be accessible with a potentially larger variety of Internet access devices. Finally, SSML has been designed to integrate with the Voice Dialogue markup standard in the creation of text-based dialogue prompts. The greatest impact of SSML may be the way it spurs the development of new generations of synthesis-knowledgeable tools for assisting synthesis text authors. It is anticipated that authors of synthesizable documents will initially possess differing amounts of expertise. The effect of such differences may diminish as high-level tools for generating SSML content eventually appear. Some authors with little expertise may rely on choices made by the SSML processor at render time. Authorspossessing higher levels of expertise will make considerable effort to mark as many details of the document to ensure consistent speech quality across platforms and to more precisely specify output qualities. Other document authors, those who demand the highest possible control over the rendered speech, may utilize synthesis-knowledgeable tools to produce 'low-level' synthesis markup sequences composed of phoneme, pitch and timing information for segments of documents or for entire documents..."
[April 17, 2001] W3C Publishes Requirements for Call Control in the Voice Browser Framework. The W3C Voice Browser Working Group has released an initial working draft specification for "Call Control Requirements in a Voice Browser Framework." The document is presented as "a precursor to work on a specification." It "describes requirements for mechanisms that enable fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform. The scope of these language features is for controlling resources in a platform on the network edge, not for building network-based call processing applications in a telephone switching system, or for controlling an entire telecom network." This W3C activity "focuses on enabling extended call control functionality in a voice browser which supports telephony capabilities. The task is constrained to defining elements and capabilities which either provide augmented functionality to be used in combination with VoiceXML or enhance the existing functionality in VoiceXML. The activities of a Call Control Subgroup will be coordinated with the activities of the Dialog Subgroup, both of which are part of the W3C Voice Browser working group." The requirements specification for call control is set against the backdrop of published goals for richer telephony functionality in VoiceXML, [which is] "designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations." W3C work on Voice Browsers is being coordinated under the W3C User Interface Domain. [Full context]
[April 06, 2001] "WebSphere Studio Leverages XML to Empower Web Developers." By Amy Wu and Sharon Thompson. In XML Journal Volume 2, Issue 4 (April, 2001), pages 56-60. ['A good Web development tool should be easy to use, yet robust enough to create and edit static and dynamic pages, organize and publish files, and help the developer properly maintain the site. IBM's WebSphere Studio is a total project management workbench with several integrated tools that assist developers in all stages of Web development. This article introduces you to Studio's wizards, editors, and publishing functions and exposes some of Studio's weaknesses as well.'] "Studio may be used in conjunction with some of the more common version control software (VCS). But even without an integrated VCS, Studio allows multiple users to access a project and check files in and out. Throughout the development process, Studio assists with link management that maintains links even while users move the source files around within the project. Studio's various editors and wizards are particularly helpful for nonprogrammers. Wizards can enable even novice users to generate server-side logic and add powerful functions to Web sites. They also leverage XML technology to make it easy for users to create Java servlets or JavaServer Pages that access databases and implement JavaBeans. When the development process is complete, or when the team is ready, Studio's powerful publishing feature enables them to easily publish files to a local server for review or to a live production server. Studio assists developers from page creation through editing and finally to publishing. WebSphere Studio includes three powerful wizards: SQL, Database, and JavaBean. These help developers easily create dynamic content for Web sites through a simple step-by-step procedure. Studio's SQL wizard generates a SQL file that specifies the query and information needed for the Database wizard so it can produce pages that access the database. The JavaBean wizard creates Web pages that utilize any JavaBeans you may have in your project. The JavaBean and Database wizards have similar steps and options. Both enable the user to select the code-generation style, generate markup languages, create error pages and specify display fields in the input and output pages, as well as choose which JavaBean methods to invoke, and in which order..." Note: IBM WebSphere Studio "provides an easy-to-use tool set that helps reduce time and effort when creating, managing and debugging multiplatform Web applications. A tool for visual layout of dynamic Web pages, Studio supports JSPs, full HTML, JavaScript and DHTML, uses wizards (for generating database-driven pages), and updates and corrects links automatically when content changes Has built-in support for the creation, management and deployment of Wireless Markup Language (WML), Voice Extensible Language (VXML) and Compact HTML for pervasive devices."
[February 07, 2001] "XML Meets 'IVR With An LCD'. Will WAP and VoiceXML partner up?" By Robert Richardson. In Computer Telephony Volume 9, Issue 2 (February 2001), pages 92-97. [Feature article on two disjoint paradigms for phone-based m-commerce.'] "Wireless may be the future, but we can't help but notice that when it comes to convergence, wireless is mostly clueless... Converged wireless is coming, no doubt, whenever 3G (that is, fast-data-enabled) wireless comes. Or at least it'll look converged - it's still an open question whether digital voice will ever wind up in the same sorts of packets as data when both are beamed out to handsets. In any case, you'll be able to talk via your Bluetooth headset while scrolling through your daily appointments on the handset... The problem in a nutshell: If we have a device that allows more than one mode of interface, it stands to reason that we ought to be able to use the input modes interchangeably. Or, as a minimum, the underlying protocols enabling those interfaces ought to make it possible for developers to support multi-modal interfaces by explicitly hand-coding options for different kinds of input and output into the same application (see diagram). Right now, the two protocols figuring most visibly in the wireless arena -- WAP and VoiceXML -- don't provide hooks to each other. Still, that's likely to change, and it's a good thing, too, because multi-modal seems like a nearly no-brainer way to make mobile applications a lot more appealing to mobile consumers. In this article, we take a look both at what WAP and VoiceXML don't do right now, how they're likely to learn how to live happily with each other, and at how at least one savvy vendor is already working to deliver some pretty sexy near-convergence scenarios... Why is it that WAP can't handle phone calls better, given that it's whole raison d'etre is to make a cell phone more usable? The most obvious answer is that it's a young standard, still in transition (an obvious fact that plenty of critics have been far too quick to overlook when dishing out anti-WAP broadsides). As things stand, WAP conveys its 'web pages' to mobile handsets using WML (wireless markup language)... the current rendition of WAP knows how to do with regard to voice calls is to initiate them. A user can make a menu selection from a WML page (or card, in WAP parlance) and a special, built-in telephony interface (on a WML card, access to this interface is simply via a URL that begins with 'wtai://' rather than 'html://') drops the current WAP phone call and dials up the new number... VoiceXML, too, shares some of the same 'early-days' shortcomings of WAP. In some respects, VoiceXML is slightly better prepared for a multi-modal world. For one thing, VoiceXML supports a tag that will initiate a call and provide rudimentary monitoring of call progress. Unlike WAP, this kind of call can be initiated either as a bridged or a blind transfer. If blind, it's no different than the WAP call to a voice number. If bridged, however, the new phone line is conferenced into the existing call. A bridge transfer assumes that the call will terminate within a preset time limit and that control will transfer back to the current VoiceXML page (and, in fact, voice options from that page are still in operation within the call). The VoiceXML server never hangs up, so the context of the call isn't lost. The fly in this ointment is that a bridged call, by virtue of the fact that the call that's already in progress is a voice call, can't handle data packets. So you won't be updating your WAP deck with a bridged call... the World Wide Web Consortium (W3C) has taken at least two steps that are sure to have an impact on future multi-modality. First, the group officially adopted XHTML Basic as a W3C recommendation. This puts the specification on track for IETF adoption and general use across the Internet. A key feature of XHTML Basic is its cross-device usability. It's designed to work on cell phones, PDAs, pagers, and WebTV, in addition to the traditional PC-with-a-VGA-screen. Second, the W3C's working group held a session in conjunction with the WAP Forum at a recent meeting in Hong Kong to discuss precisely the problem of making WAP and VoiceXML aware of each other. The upshot was a decision to form a multi-modal working group. Interested parties presenting at the Hong Kong workshop included Nuance, Philips, NTT DoCoMo, IBM, NEC, PipeBeach, and OpenWave. The WAP Forum, a technical consortium representing manufacturers and service providers for over 95% of the handsets in the global wireless market, is already taking steps toward interoperability with other XML-based protocols..."
[February 07, 2001] "What is VoiceXML?" By Kenneth Rehor. In VoiceXML Review Volume 1, Issue 1 (January 2001). ['If you are new to VoiceXML, this overview article will serve as an excellent starting point. For those of you who have already been authoring VoiceXML applications with one of the software developer kits, platforms, and/or on-line developer "web studios" available from various vendors, this article goes beyond the syntactical elements of the language and describes the typical reference architecture in which the VoiceXML interpreter resides.]' "VoiceXML is a language for creating voice-user interfaces, particularly for the telephone. It uses speech recognition and touchtone (DTMF keypad) for input, and pre-recorded audio and text-to-speech synthesis (TTS) for output. It is based on the Worldwide Web Consortium's (W3C's) Extensible Markup Language (XML), and leverages the web paradigm for application development and deployment. By having a common language, application developers, platform vendors, and tool providers all can benefit from code portability and reuse. With VoiceXML, speech recognition application development is greatly simplified by using familiar web infrastructure, including tools and Web servers. Instead of using a PC with a Web browser, any telephone can access VoiceXML applications via a VoiceXML 'interpreter' (also known as a 'browser') running on a telephony server. Whereas HTML is commonly used for creating graphical Web applications, VoiceXML can be used for voice-enabled Web applications. There are two schools of thought regarding the use of VoiceXML:(1) As a way to voice-enable a Web site, or (2) As an open-architecture solution for building next-generation interactive voice response telephone services. One popular type of application is the voice portal, a telephone service where callers dial a phone number to retrieve information such as stock quotes, sports scores, and weather reports. Voice portals have received considerable attention lately, and demonstrate the power of speech recognition-based telephone services. These, however, are certainly not the only application for VoiceXML. Other application areas, including voice-enabled intranets and contact centers, notification services, and innovative telephony services, can all be built with VoiceXML. By separating application logic (running on a standard Web server) from the voice dialogs (running on a telephony server), VoiceXML and the voice-enabled Web allow for a new business model for telephony applications known as the Voice Service Provider. This permits developers to build phone services without having to buy or run equipment..."
[February 07, 2001] "Open Dialog: Activities of the VoiceXML Forum and W3C." By Gerald M. Karam. In VoiceXML Review Volume 1, Issue 1 (January 2001). ['Even if you're already involved in VoiceXML technology, perhaps you'd like to know a bit more about the origins of the language. This article provides insightful background on the the VoiceXML Forum, the Forum's working relationship with W3C, and how to get involved in both arenas.'] "With the launch of the VoiceXML Forumin March of 1999, and the release of the VoiceXML 1.0 specification in March 2000, there has been a surge of activity in the speech and telephony industry around the VoiceXML concept, products and services. In conjunction with these events, the VoiceXML community has been progressing the language further and improving the business environment in which VoiceXML exists. Most notableare the efforts of the VoiceXML Forum and the World Wide Web Consortium (W3C) Voice Browser Working Group (VBWG). Figure 1 below provides a brief history lesson in how all the participants work together. [...] the VoiceXML Forum and W3C felt it would be mutually beneficial to have a working relationship with regard to VoiceXML activities. Consequently, in September 2000, the two organizations and their constituents began formal negotiations on a memorandum of understanding that would define the ways in which collaboration would take place. We're hoping to have this memorandum approved in January 2001. At this time, the language work takes place within the W3C VBWG, chaired by Jim Larson of Intel Corp., and in its various subgroups. The specific work developing what is expected to become VoiceXML 2.0 is taking place in the Dialog Language Sub-Working Group chaired by Scott McGlashan of PipeBeach. The development of other markup languages (e.g., for speech grammars and speech synthesis) is handled in other subgroups. The work takes place through email, teleconferences and in face-to-face meetings that occur every couple of months. The VoiceXML Forum has activities in marketing (chaired by Carl Clouse, Motorola), conformance (chaired by the author), and education (where the author is acting chair). Participation in these committees is limited to VoiceXML Promoter and Sponsor members. If you would like to get involved, please contact the VoiceXML Forum office (membership@voicexml.org). For the full range of VoiceXML Forum activities, please check out the Web site at http://www.voicexml.org. As you can see, VoiceXML is heating up, and with the wide range of industrial support behind the VoiceXML Forum and the W3C VBWG, the best intellectual and corporate resources are collaborating to make VoiceXML a driving force in the telephony and speech application world..."
[January 15, 2001] "VoiceXML Forum Launches First Issue of VoiceXML Review." - ['The VoiceXML Forum launches inaugural issue of VoiceXML Review, a Web-based e-zine devoted entirely to VoiceXML.'] "The VoiceXML Forum today launched the inaugural issue of its e-zine, VoiceXML Review. Developed by the Forum's Education Committee, the publication will serve as an educational vehicle for the VoiceXML community, and provide a mechanism for the Forum to promote its initiatives. The Forum anticipates the e-zine will foster growth and maturation in VoiceXML technology by educating the speech recognition community and attracting new participants. The e-zine can be accessed via the Web at http://www.voicexmlreview.org. 'For the first time, contributors and observers within the VoiceXML community will have a place to publish their findings and exper iences, and learn from others on a regular basis,' said Jonathan Engelsma, Editor-in-Chief of VoiceXML Review. 'The interest shown so far in this periodical is exciting, and we believe it has great potential in further disseminating VoiceXML knowledge and expertise within and beyond the Forum's membership.' The articles in VoiceXML Review will be authored by experts in the field of VoiceXML, most of whom are members of the VoiceXML Forum. The periodical will be freely available via the Forum's Web site. An option will be available for readers to "register" themselves as subscribers, which will allow them to receive notification when new issues are posted. VoiceXML Review will also feature an interactive search page and an archive of past issues. Each issue of VoiceXML Review will feature a blend of full-length feature articles, monthly columns, and news from the VoiceXML community. The first issue of the e-zine will present the following articles: (1) Features: "What is VoiceXML?" by Kenneth G. Rehor; "Activities of the VoiceXML Forum and the W3C" by Gerald M. Karam (2) Columns: "First Words: VoiceXML -- Where Speech Meets the Web" by Rob Marchand; "Speak & Listen: Find Answers to Your VoiceXML Questions" by Jeff Kunins. Contact: Jonathan Engelsma (Editor-in-Chief, VoiceXML Review). [cache]
[January 04, 2001] "VoxML: Get Your Database Talking." By Srdjan Vujosevic (VP of Engineering and Co-founder, WaveDev.com) and Robert Laberge (President, WaveDev.com). In WebTechniques Volume 6, Issue 2 (February 2001), pages 51-55. ['With the emergence of VoxML and its newest sibling, VoiceXML, web sites can now talk back to their users. Srdjan and Robert build an online coffee shop with VoxML, demonstrating how to create a new dialog with their customers.]' (and note:) "Internet application voice commands are in their infancy... Imagine dialing into a portal and asking, 'What's the current price for XYZ?' The portal would respond, 'XYZ at 11:45 a.m. is trading at $88, with a day high of $89 and a day low of $87.' Other possible applications include checking individual movie times and show listings, searching for specific items on your bank statement, and having your latest email read to you as you drive to work. Some of this technology has existed for awhile now, but only limited and proprietary form. VoiceXML and VoxML, which are derived from the XML specification for languages, allow any company to develop a voice-enabled application without starting from scratch. It's important not to confuse VoiceXML, or the related VoxML, with Voice-over-IP, which is a technology that lets people make 'phone calls' over the Internet. VoiceXML and VoxML are languages that must be interpreted by a voice browser. Just like Web browsers read and interpret HTML documents, voice browsers read and interpret VoiceXML documents. The voice browser also accepts voice commands and sends them to applications over the Internet. The browser then reads the output from the applications back to the end user...Motorola took a big step toward merging telephony and the Internet when the company announced VoxML in September 1998. VoxML is hardly the first attempt to teach the Internet to talk (and listen) to users. Unfortunately, many previous attempts were comparable to screen scraping, where an interpreter would read an existing Web page word-by-word. One problem with this approach is that most Web pages are designed with a focus on the visual experience -- an attribute that doesn't translate to voice very well. In the opposite direction, the user's voice would be recognized, interpreted, and transformed into HTML code. The voice-to-HTML conversion resulted in errors, and because HTML wasn't designed to represent voice data, its limitations restricted wide acceptance and development of this feature. After Motorola released the VoxML specification to help overcome HTML's limitations, the company was joined by AT&T, Lucent, and IBM to continue the progress by creating VoiceXML. VoiceXML is an advanced version of VoxML with more features and updates. On May 22, 2000, the W3C announced that it acknowledged the submission of VoiceXML version 1.0 specifications. However VoxML is more mature, and hence has more support at this time. There are some comparative technologies in this field, including SpeechML. Similarly, HP Labs has TalkML. These companies aren't the only ones that have tried to create their own Voice Markup Languages (VML). While firms can create their own proprietary technology, VML's standards ultimately lie with W3C. Their recommendations will shape future standards in the voice industry. We'll focus on VoxML technology, because it lets you create and test your own voice Internet applications right away. Also, VoxML is more mature, and is the foundation for the voice industry. More Motorola gateway sites use VoxML than use VoiceXML because the gateway technology is based on VoxML. VoxML documents reside on the Web server where the voice browser locates and accesses them. The voice browser resides on a VoxML commercial site, interacting with the user via a telephone network and with a Web server over the Internet on the other side. High-end speech recognition and speech synthesis engines form the essence of the browser. Motorola provides the Mobile Application Development Kit (ADK) for free on their Web site. The current release is version 2.0 (beta) and includes VoiceXML as well as continued support for VoxML. The previous fully released version, ADK 1.1, only supports VoxML... Unlike client side browsers for WML or HTML (Up.Browser and Nokia or IE and Netscape), VoxML and VoiceXML are server side browser technologies. To build a voice application, you'll need not only the downloadable ADK, but access to the Voice Browser -- an integral part of the VoxGateway infrastructure. Motorola offers a 30-day free trial and provides an access telephone number to its VoxGateway that points to the specific VoxML, VoiceXML, or WAP code residing on your Web server. This allows easy voice application testing. Several companies allow voice access to their sites with VoxML or VoiceXML gateways, while others offer proprietary Voice Browsers..." See also "VoxML Markup Language."
[May 06, 2000] The World Wide Web Consortium (W3C) has acknowledged receipt of a 'NOTE' from the VoiceXML Forum: Voice Extensible Markup Language (VoiceXML) Version 1.0. References: W3C Note 05-May-2000, 'http://www.w3.org/TR/2000/NOTE-voicexml-20000505'. This public version updates a 'Release Candidate' document which was released to Forum Supporters on March 02, 2000. Authored by the VoiceXML Forum technical working group: Linda Boyer (IBM), Peter Danielsen (Lucent Technologies), Jim Ferrans (Motorola), Gerald Karam (AT&T), David Ladd (Motorola), Bruce Lucas (IBM), and Kenneth Rehor (Lucent Technologies). The submitted document "specifies VoiceXML, the Voice Extensible Markup Language. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications." A note on 'Implementation Scope' indicated that the VoiceXML 1.0 specification was designed for speech-based telephony applications. Appendix B supplies the VoiceXML Document Type Definition. Description: The architectural model includes "a document server (e.g., a web server) [which] processes requests from a client application, the VoiceXML Interpreter, through the VoiceXML interpreter context. The server produces VoiceXML documents in reply, which are processed by the VoiceXML Interpreter. The VoiceXML interpreter context may monitor user inputs in parallel with the VoiceXML interpreter. For example, one VoiceXML interpreter context may always listen for a special escape phrase that takes the user to a high-level personal assistant, and another may listen for escape phrases that alter user preferences like volume or text-to-speech characteristics. The implementation platform is controlled by the VoiceXML interpreter context and by the VoiceXML interpreter. For instance, in an interactive voice response application, the VoiceXML interpreter context may be responsible for detecting an incoming call, acquiring the initial VoiceXML document, and answering the call, while the VoiceXML interpreter conducts the dialog after answer. The implementation platform generates events in response to user actions (e.g., spoken or character input received, disconnect) and system events (e.g., timer expiration). Some of these events are acted upon by the VoiceXML interpreter itself, as specified by the VoiceXML document, while others are acted upon by the VoiceXML interpreter context..." The VoiceXML Forum is an industry organization founded by AT&T, IBM, Lucent and Motorola. It was established to develop and promote the Voice eXtensible Markup Language (VoiceXML), a new computer language designed to make Internet content and information accessible via voice and phone. With the backing and technology contributions of its four world-class founders, and the support of leading Internet industry players, the VoiceXML Forum has made speech-enabled applications on the Internet a reality." The W3C staff comment on the document notes that "VoiceXML appears to be a good match to the dialog requirements identified by the W3C Voice Browser working group. All submitters have indicated that they may own patents or patent applications which apply to the VoiceXML submission, but have not provided any detailed information on specific patents, or the components of VoiceXML which are covered by patents." Public comments on the NOTE may be sent via email to submission@voicexml.org.
[October 05, 2000] "VoiceXML - Taking IVR to the Next Level. A bottom up look at XML and VoiceXML." By Chris 'Dr. CT' Bajorek (Co-founder, CT Labs). In Computer Telephony (October 2000). "... So why should you create IVR applications based on voiceXML when existing IVR products work just fine? One significant reason: VoiceXML provides an intrinsic ability to access information stored on or accessed through a corporate web server. Since IVR systems generically require access to one or more corporate databases, any such database connectivity already implemented via a company web server is directly usable in a voiceXML script. This saves development time and money and can greatly reduce maintenance costs. Another clear benefit is that existing web application development tools become mature tools for development of voiceXML-based IVR applications. Using such tools and development methodologies also frees up IVR application developers from low-level IVR platform or database access details. VoiceXML applications by their very nature have excellent portability across web server and IVR platforms that properly support the standard. This means you are free to change to other voiceXML-compliant IVR platform vendors without losing your development work... Here's a little about how voiceXML works under the covers: a document server (usually a web server) processes requests from a client application, the voiceXML interpreter, through the voiceXML interpreter context. The server replies with voiceXML documents that contain actual voiceXML commands that are processed by the voiceXML interpreter. The voiceXML interpreter context may monitor caller inputs in parallel with the voiceXML interpreter. For example, one voiceXML interpreter context may always listen for a special touch tone command that takes the caller to a special menu, while another may listen for a command that changes the playback volume during the call. The implementation platform contains the telephone hardware and related CT resources and is controlled by the voiceXML interpreter context and voiceXML interpreter. For instance, an IVR application may have the voiceXML interpreter context detecting an incoming call, reading the first voiceXML document, and answering the call while the voiceXML interpreter executes the first touch tone menu process after the call is answered. The implementation platform generates events in response to caller actions (e.g., touch tone or spoken commands) and system events (e.g., timers expiring). Some of these events are acted upon by the voiceXML interpreter itself, as specified by the voiceXML document, while others are acted upon by the voiceXML interpreter context. What does a voiceXML application look like? Here is a voiceXML page fragment generated by a web application that implements a classical dtmf-based IVR menu..."
[November 14, 2000] SpeechObjects Specification. The W3C has acknowledged a submission from Nuance Communications, Inc. for a SpeechObjects Specification Version 1.0. Reference: W3C Note 14-November-2000, edited by Daniel C. Burnett. Document abstract: "This document describes SpeechObjects, a core set of reusable dialog components that are callable through a dialog markup language such as VoiceXML, to perform specific dialog tasks, for example, get a date or a credit card number, etc. The major goal of SpeechObjects is to complement the capabilities of the dialog markup language and to leverage best practices and reusable component technology in the development of speech applications." Description: "SpeechObjects are reusable software components that encapsulate discrete pieces of conversational dialog. SpeechObjects are based on an open architecture that can be deployed on any of the major server and IVR (interactive voice response) platforms. This paper describes a specification based on Nuance's Java implementation of SpeechObjects. Simply stated, a SpeechObject is a reusable software component that implements a dialog flow and is packaged with the audio prompts and recognition grammars that support that dialog. An implementation of the foundation set of SpeechObjects, including source code, is freely available to the SpeechObjects developer community as part of Nuance's Open Voice Framework initiative." The specification from Nuance is set against the backdrop of work conducted in the W3C Voice Browser Working Group, which "has determined requirements for several specifications including one for a Reusable Dialog Component Requirements." According to the W3C staff comment: "W3C is working to expand access to the Web to allow people to interact with Web sites via spoken commands, and listening to prerecorded speech, music and synthetic speech. The W3C Voice Browser Activity has produced a set of requirements for interactive voice response applications and is now developing a set of specifications that meet these requirements... The W3C Voice Browser Working Group plans to develop specifications for its Speech Interface Framework using SpeechObjects as a model for work on reusable dialog components. This work is already underway, following the publication of a requirements draft for reusable dialog components. A specification meeting these requirements is under development, with the goal of being used together with W3C's dialog markup language. It is recommended that the Nuance Communications SpeechObjects submission is carefully examined in the context of this work."
[October 06, 2000] "Voice-activated Web access. A new flavor of XML redefines mobile communication." By Jeff Jurvis (Principal consultant, Rainier Technology, Inc.). From IBM developerWorks [Collaboration library], September 2000. ['There's more than one way to connect to the Internet on a mobile phone. WAP and WML are among the more common technologies used in North America. Now a new XML schema is providing another way for users to link up to the Web over their mobile phones.]' "People are talking about extending the Web to mobile phones, and most of the talk is about transplanting something like the traditional Web browser into the phone's smaller footprint. The Wireless Application Protocol (WAP) and its Wireless Markup Language (WML) are -- along with WML's antecedent: Handheld Device Markup Language (HDML) -- the most common technologies used to extend the Web to mobile phones in North America. WAP is nearly ubiquitous in Europe, and iMode (with Compact HTML) is huge in Japan. Each approach delivers text and limited graphics to a small phone display. But we tend to overlook another channel for providing Web content over the phone: the phone's microphone and speaker. Voice XML Voice XML is a new XML schema intended to standardize the delivery of audio dialogs for voice access and interactive voice response linked to Web-based content and applications. IBM, Motorola, Lucent, and AT&T founded the Voice XML Forum in early 1999 to leverage existing speech technologies to make the Internet accessible by voice and phone. Not only do speech technologies open the Web to those unable to use visual browsers due to circumstance or physical limitations, they also make Web access more convenient for all users. New speech technologies can create dialog-driven applications such as speech recognition, speech synthesis, and recording and playback of digitized speech on PCs and servers for distribution to client devices. Voice XML provides a technology-neutral language that can be used to deliver speech applications. These applications separate the front-end presentation layer in Voice XML from the back-end services that handle speech and the mechanics of processing. For example, a well-designed Web site could easily support a voice-driven browser (such as one you might use on a mobile phone) in the same way the site would support another browser (such as a WAP browser or an HTML browser). When the initial request comes in from the browser, the server sniffs the browser type. If the browser is identified as a voice browser, the server will return the appropriate Voice XML pages..." Also in PDF format.
"Voice eXtensible Markup Language - VoiceXML." Version: 1.00, 07 March 2000. The specification itself "introduces VoiceXML, the Voice Extensible Markup Language. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications." [local archive copy]
A DTD for Voice Extensible Markup Language - Version 1.00
Announcement March 02, 1999. Also from Bell Labs.
VXML Forum Supporters
Press Releases
[August 25, 1999] VoiceXML Forum Version 0.9 Specification. [local archive copy]
[March 07, 2000] "VoiceXML Forum Issues Version 1.0 of New Mark Up Language for Voice Internet Access. Number of Supporters Grows to 79." - "The VoiceXML Forum today announced it has completed the VoiceXML 1.0 specification, which is expected to expand the reach of the Internet by providing voice access to content and services. The Forum membership, which now numbers 79 companies, is reviewing the specification before it is submitted to the appropriate body for formal standardization. Based on the World Wide Web Consortium's industry-standard eXtensible Markup Language (XML), Version 1.0 of the VoiceXML specification provides a high-level programming interface to speech and telephony resources for application developers, service providers and equipment manufacturers. Standardization of VoiceXML will: (1) simplify creation and delivery of Web-based, personalized interactive voice-response services; (2) enable phone and voice access to integrated call center databases, information and services on Web sites, and company intranets; and (3) help enable new voice-capable devices and appliances. On the basis of the 0.9 version of the specification released last year, many companies have already begun implementing VoiceXML in their products and services, and a market for third-party VoiceXML application development has begun to emerge. The 1.0 version of the specification, currently being reviewed by Forum members, is now available to the public on the Forum's Web site. The VoiceXML 1.0 specification is based on years of research and development at AT&T, IBM, Lucent Technologies and Motorola, as well as comments from Forum members."
[September 07, 2000] "VoiceXML for Web-Based Distributed Conversational Applications." By Bruce Lucas (IBM). In Communications of the ACM (CACM) Volume 43, Number 9 (September 2000), pages 53-57. "VoiceXML replaces the familiar HTML interpreter (Web browser) with a VoiceXML interpreter and the mouse and keyboard with the human voice. Until recently, the Web delivered information and services exclusively through visual interfaces on computers with displays, keyboards, and pointing devices. The Web revolution largely bypassed the huge market for information and services represented by the worldwide installed base of telephones, for which voice input and audio output provide the sole means of interaction. Development of speech services has been hindered by a lack of easy-to-use standard tools for managing the dialogue between user and service. Interactive voice-response systems are characterized by expensive, closed application-development environments. Lack of tools inhibits portability of applications and limits the availability of skilled application developers. Consequently, voice applications are costly to develop and deploy, so voice access is limited to only those services for which the business case is most compelling for voice access. Here, I offer an introduction to VoiceXML, an emerging standard XML-based markup language for distributed Web-based voice services, much as HTML is a language for distributed visual services. VoiceXML brings the power of Web development and content delivery to voice-response applications, freeing developers from low-level programming and resource management. It also enables integration of voice services with data services, using the familiar client/server paradigm and leveraging the skills of Web developers to speed application development for this new medium. . . VoiceXML supports simple 'directed' dialogues; the computer directs the conversation at each step by prompting the user for the next piece of information. Dialogues between humans don't operate on this simple model, of course. In a natural dialogue, each participant may take the initiative in leading the conversation. A computer-human dialogue modeled on this idea is referred to as a 'mixed-initiative' dialogue, because either the computer or the human may take the initiative. The field of spoken interfaces is not nearly as mature as the field of visual interfaces, so standardizing an approach to natural dialogue is more difficult than designing a standard language for describing visual interfaces like HTML. Nevertheless, VoiceXML takes some modest steps toward allowing applications to give users some degree of control over the conversation. In the forms described earlier, the user was asked to supply (by speaking) a value for each field of a form in sequence. The set of phrases the user could speak in response to each field prompt was specified by a separate grammar for each field. This approach allowed the user to supply one field value in sequence. Consider a form for airline travel reservations in which the user supplies a date, a city to fly from, and a city to fly to. . . VoiceXML enables such relatively natural dialogues by allowing input grammars to be specified at the form level, not just at the field level. A form-level grammar for these applications defines utterances that allow users to supply values for a number of fields in one utterance. For example, the utterance 'I'd like to fly from New York on February 29th' supplies values for both the 'from city' field and the 'date' field. VoiceXML specifies a form-interpretation algorithm that then causes the browser to prompt the user for the values (one by one) of missing pieces of information (in this example, the 'to city' field). VoiceXML's special ability to accept free-form utterances is only a first step toward natural dialogue. VoiceXML will continue to evolve, incorporating more advanced features in support of natural dialogue..."
[September 07, 2000] "The Promise of a Voice-Enabled Web. [Software Technologies.]" By Peter J. Danielsen. In IEEE Computer Volume 33, Number 8 (August 2000), pages 104-106. "In 1999, AT&T, IBM, Lucent Technologies, and Motorola formed the VoiceXML Forum to establish and promote the Voice eXtensible Markup Language (VoiceXML) as a standard for making Internet content available by voice and phone (see www.voicexml.org/). Each company had previously developed its own markup language, but customers were reluctant to invest in a proprietary technology that worked on only one vendor's platform. Released in March 2000, version 1.0 of the language specification is based on years of research and development at the founding companies and on comments received from among the more than 150 companies that belong to the Forum. In this column, I review the existing architectures for Web and phone services, describe how VoiceXML enables consolidation of service logic for Web and phone, and summarize the features of the VoiceXML 1.0 specification... The features of VoiceXML can be grouped into four broad areas: dialog, telephony, platform, and performance. (1) Dialog features: Each VoiceXML document consists of one or more dialogs. The dialog features cover the collection of input, generation of audio output, handling of asynchronous events, performance of client-side scripting, and continuation of the dialog. VoiceXML supports the following input forms: audio recording, automatic speech recognition, and touch-tone. Output may be prerecorded audio files, text-to-speech synthesis, or both. The language supports the generation and handling of asynchronous events: both 'built-in' events -- such as a time-out, an unrecognized input, or a request for help -- and user-defined events. Event handlers typically specify some new output to be provided to the caller and whether to continue the existing dialog or switch to another. (2) Telephony features: VoiceXML provides basic control of the telephony connection. It allows a document author to specify when to disconnect and when to transfer the call. Transfers follow one of two scenarios: a blind transfer that terminates the VoiceXML session as soon as the call is successfully transferred, and bridging, which suspends the VoiceXML session while the caller is connected to a third party - for example, a customer service representative. The session resumes once the conversation with the third party has completed. The system saves the outcome of the transfer, which may be submitted with other data in a subsequent URL request. (3) Platform features: While VoiceXML provides a standard way to describe dialogs, it also provides mechanisms to accommodate individual platform capabilities. This includes invoking platform-specific functionality and controlling platform-specific properties. One platform, for example, may have an advanced speaker-verification package, while another may have a custom credit-card dialog. Another may permit control of its proprietary speech-recognition parameters. can use caching to avoid a fetch attempt altogether. (4) Performance features: Not only are VoiceXML documents Web-based, but so are the resources they use, with each resource's location specified by a URL. These resources include audio files, input grammars, scripts, and objects. The VoiceXML client must retrieve and install them prior to use. One challenge VoiceXML service providers will face involves minimizing the amount of 'dead air' the caller hears while the system fetches resources. A PC user with a visual browser sees a spinning icon when the system retrieves resources, but a caller in contact with a VoiceXML platform may not be aware that the service is Web-based, and thus likely considers silence a signal that something has gone awry. VoiceXML provides several facilities to either eliminate or hide the delays associated with retrieving Web resources." [Peter J. Danielsen is a distinguished member of the technical staff in the Software Production Research Department at Lucent Technologies' Bell Labs in Naperville, Illinois. One of the authors of the VoiceXML 1.0 specification, he has developed a variety of interactive voice-response services and service-creation environments for AT&T and Lucent Technologies.]
[March 17, 2000] "Mobile users' new voice. AT&T and Lucent tap VoiceXML spec for new services." By Grant Du Bois. In PC Week [Online] Volume 17, Number 11 (March 10, 2000), page 16. " An Internet standard for voice communications is beginning to spawn new services for mobile professionals. Forthcoming services from AT&T Corp. and Lucent Technologies Inc. are part of a trend to use VoiceXML (Extensible Markup Language) to provide standardized speech platforms, applications, services and development tools to increase access to Internet content and services by voice. Version 1.0 of the VoiceXML specification, which the VoiceXML Forum released last week at the Computer Telephony Expo in Los Angeles, simplifies the creation of Web-based interactive voice response services; enables voice access by phone to Web sites, company intranets and call center databases; and provides a platform for new devices and appliances. AT&T, of Basking Ridge, N.J., is using VoiceXML in a service it is testing for its WorldNet Internet service. Called Unified Alerting, the service uses VoiceXML to act as a bridge between the public phone network and the Internet..."
[March 10, 2000] "VoiceXML Specification to Lead Announcements at CT Expo." By Cathleen Moore. In InfoWorld Volume 22, Issue 10 (March 06, 2000), page 26. "Vendors will use next week's Computer Telephony (CT) Expo in Los Angeles to showcase new applications and capabilities for voice technologies and call centers. Highlighting the continuing evolution of the telephony market will be a new standard to ease the development of voice response-based interactions on the Web. Advances in communications servers and speech-telephony software will also be demonstrated. At CT Expo, the VoiceXML Forum, which was founded in March of last year by AT&T, IBM, Lucent Technologies, and Motorola, is expected to announce that it has completed Version 1.0 of the VoiceXML specification. The VoiceXML specification was developed in an effort to make XML-based content more accessible via voice commands and phone interfaces. It is also intended to drive new voice-capable devices, appliances, and services. Several companies have already begun developing products based on the preliminary VoiceXML specification. According to one analyst, the standard may herald a radical change in the way automated telephone services are used. '[VoiceXML] can support call center-style applications in a way [that allows] people to treat automated telephone services that use speech recognition in a much more flexible way, much like they treat their Web pages,' said William Meisel, president of TMA Associates, in Tarzana, Calif."
[March 10, 2000] "Standard completed for voice-activated Web browsing." By David Rohde. In InfoWorld (March 09, 2000). "The effort to provide voice-activated equivalents to Web hyperlinks this week took a step forward, as the VoiceXML Forum announced it has completed Version 1.0 of its specification. Last year Lucent, IBM, Motorola, and AT&T created the VoiceXML Forum. The forum is tasked with creating a high-level programming interface to speech and telephony resources for application developers, service providers, and equipment manufacturers. At the Computer Telephony Expo in Los Angeles, the forum members said the group had finished the VoiceXML specification and will now contribute it to the World Wide Web Consortium. IBM officials at the show demonstrated the use of VoiceXML with its ViaVoice speech-recognition technology, which enables users to speak equivalents of hyperlinks. They said properly trained end-users would find this faster than wading through several layers of IVR (interactive voice response) prompts on a telephone keypad, and noted that the technology could enable hands-free operation in environments such as cars with cellular phones. Call centers that are responsible for answering inquiries from Web sites might find this capability especially useful because it would cut down on the amount of time end-users chew up on IVR systems requesting information, said Anne-Marie Derouault, director of strategy and alliances for IBM's Voice Systems unit. Integration with back-office systems, integration with traditional Web-access methods, and getting end-users accustomed to speech-activating Web sessions are among the challenges the vendors will likely face before such technology is widely employed. . . "
[August 22, 2000] "Write Once, Publish Everywhere." By Didier Martin. From XML.com. (August 16, 2000). ['In the first of an ongoing series of "Didier's Labs", we embark on the process of building a multi-device XML portal. The end result will be an XML server that can be accessed via desktop web browsers, mobile phone WAP browsers, and by voice using VoiceXML. In this installment, Didier outlines the project and models the login process for the server.'] "Our task is to create interactive applications using XML technologies. These applications should be accessible on three different devices: the telephone (VoiceXML), mobile phone mini browsers (WML), and finally, PC browsers (HTML). To realize this vision, we will create abstract models and encode them in XML. Then, we must recognize the device and transform these models into the appropriate rendering format. The rules of the game are to use, as much as possible, XML technologies which are freely available, and also to restrict our scope to XML technologies documented by a public specification."
[August 28, 2000] "Adapting Content for VoiceXML." By Didier Martin. From XML.com (August 23, 2000). ['In the second part of his "Write Once, Publish Everywhere" project, Didier Martin takes us through creating content for voice browsers. He deals with the architecture necessary to interface with voice browsers, and includes a simple introduction to VoiceXML.'] "Not all devices are able to accept and transform an XML document into something we can perceive with our senses. Most of the time, the target device supports only a rendering interpreter. Therefore we need to transform the abstract model -- as held in our server-side XML document, into a format the rendering interpreter understands. Browsers with more resources at their disposal will probably in the near future perform the transformation on the client side. But devices with less available resources require that any transformation from an XML model to a rendering format occurs on the server side. To transform the <xdoc> element and its content, we'll use XSLT (Extensible Stylesheet Language Transformations). Dependent on the device profile, we perform one of three different transformations..."
[September 07, 2000] "Hello, Voice World." By Didier Martin. From XML.com (September 06, 2000)." In our last trip to Didier's Lab, we encountered the aural world of XML made possible by the VoiceXML language. This week I'll explain more about VoiceXML and create the classic 'Hello World' application. But this time instead of seeing the result, you'll listen to it. People intrigued by the last article asked me if and how VoiceXML documents are used to build voice applications. Answering this question presents an opportunity to highlight VoiceXML's features, and the way its basic concepts make it very different from HTML or XHTML. A VoiceXML application is a collection of dialogs. A dialog is the basic interaction unit between the VoiceXML interpreter and an interlocutor. A dialog unit can either be a form or a menu. A form consists of a collection of fields which are filled by the interlocutor. A menu is a choice made by an interlocutor. The figure below shows an example VoiceXML application with the links between the various dialogs shown...The IBM VoiceXML interpreter is freely available from the IBM alphaWorks site."
[October 27, 1999] "New Motorola, Unisys Agreement Expands Access to Internet. Integrated Offerings from Industry Leaders To Make Development of Enhanced Voice Applications Easy." - "Motorola and Unisys today announced an agreement expected to significantly expand access to the Internet. This agreement proposes to allow consumers to find information and conduct transactions on the Internet using any telephone. According to the agreement, Motorola will license its VoxGateway software to Unisys and Unisys will license its award-winning Natural Language (NL) Speech Assistant toolkit and runtime environment to Motorola. Both plan to distribute VoxGateway software as part of their products. The two companies plan to enhance the capabilities of Motorola's VoxGateway software and integrate it with the NL toolkit. These plans include developing an advanced NL Speech Assistant Code Builder, which is expected to automatically generate program code in VoxML programming language, a precursor to the VoiceXML standard. By reducing the need for manual coding by application developers, this will create an easy-to-use software development toolkit for third parties, which is expected to mean more programs will be written that use these technologies."
[October 05, 1999] IBM's alphaWorks labs has released an early-access implementation of VoiceXML, an emerging standard for building distributed Internet-based voice applications. "VoiceXML [formerly: 'VXML'] is an XML-based markup language for distributed voice applications, much as HTML is a language for distributed visual applications. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. The goal is to provide voice access and interactive voice response (e.g. by telephone, PDA, or desktop) to web-based content and applications. VoiceXML is being defined by an industry forum, VoiceXML Forum founded by AT&T, IBM, Lucent and Motorola, established to promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML brings the power of web development and content delivery to voice response applications, and frees the authors of such applications from low-level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm, and it gives users the power to seamlessly transition between applications. The dialogs are provided by document servers, which may be external to the browser implementation platform." The IBM tool is available for Windows 95 and Windows 98 (with 32MB of memory) or Windows NT (with 48MB), and a full-duplex sound card. It also requires JDK/JRE 1.1.x or JDK/JRE 1.2.x. The package "contains an early-access implementation of a subset of the VoiceXML 0.9 language specification, together with some VoiceXML samples. We are providing this early-access partial implementation in order to facilitate feedback on the language specification. Users should be aware that the 0.9 specification is preliminary and that significant and incompatible changes may be made in the language." See also the VoiceXML Forum Web site.
[August 25, 1999] "VoiceXML Forum Releases Preliminary Specification, More Than Triples Membership. IBM Joins AT&T, Lucent, Motorola as Key Member And Contributes Its Speech Markup Language." - "The Voice eXtensible Markup Language (VoiceXML) Forum has released its preliminary specification for VoiceXML, the new markup language that is expected to expand the reach of the Internet through voice access to Web content and services. Since its launch in March 1999 as the VXML Forum, the group has more than tripled in size by adding 44 leading technology industry players to its membership roster. The VoiceXML specification introduces a markup language for voice applications based on eXtensible Markup Language (XML), an emerging technology that is expected to revolutionize the Internet industry. The VoiceXML Forum aims to drive the market for voice-enabled Internet services through the creation of a common specification based on existing Internet standards. The VoiceXML specification is expected to simplify creation and delivery of Web-based, personalized interactive voice-response services and enable phone and voice access to integrated call center databases, information on Web sites, and company intranets. The VoiceXML specification also will help enable new voice-capable devices and appliances." [local archive copy]
[October 21, 1999] "The World Wide Telephone." By Suzanne Hildreth. In WebServer Online Magazine (October 1999). "Considering adding telephone access to your Web site or making some of your Web content speech-enabled? You might want to hold off a few more months. That's because the VoiceXML Forum, a group formed last March to work on developing a common standard for integrating voice, telephony and Web content, is getting set to send its first version of the VoiceXML specification to the World Wide Web Consortium (W3C) for review. Founded by AT&T Corp., IBM Corp., Lucent Technologies and Motorola Inc. -- and joined by 57 other companies -- the VoiceXML Forum hopes to establish a standard markup language for speech-enabling Internet content and for integrating traditional telephony with the Web. In August, the group released Version 0.9 of the specification for public comment. Version 1.0 of the VoiceXML specification is due to be submitted to the W3C by the end of the year, with products based on it appearing as early as mid-2000, says Gerald Karam, division manager for AT&T, New York, NY."
Lucent Technologies press release (March 02, 1999) - "AT&T, Lucent Technologies, and Motorola create VXML Forum. Companies seek open standard to promote voice access to web services."
Technical Backgrounder, [local archive copy]
VXML Forum: Comments by Industry Analysts
W3C "Voice Browser" Activity and Briefing Package
Compare: Motorola's VoxML language
Compare: Java Speech Markup Language (JSML)
Compare: SpeechML
Compare: SABLE: A Standard for Text-to-Speech Synthesis Markup
[March 02, 1999] "Major Players Team Up on Voice Markup Language." By Ted Smalley Bowen. In InfoWorld [Electric] (March 02, 1999). "After starting down separate paths, AT&T, Lucent Technologies, and Motorola have joined forces to back a standard markup language for writing voice-activated and telephony applications for accessing Web content. The vendors Tuesday announced the formation of the Voice Extensible Markup Language (VXML) Forum, which has as its primary stated goal opening Web resources to phone access. By generating and backing a single version of VXML and turning it over to the World Wide Web Consortium (W3C) for ratification and licensing later this year, forum members hope to boost the markets for Web-based content and services accessible by phone and for the related tools, according to David Unger, manager of product strategy and development for AT&T's consumer markets division, in Basking Ridge, N.J. The timing of the handoff to the W3C depends on the volume of input from the public comment phase, according to Unger, who said the lead forum members will likely have between 20 and 100 people working on the effort. So far, the W3C's Voice Browser group has been tracking the VXML work."
[March 08, 1999] "New Markup Language Ties Telephone To Web." By Kathleen Cholewka. In Inter@ctive Week Volume 6, Number 10 (March 09, 1999), page 14. "Major software developers are writing a new language that will enable businesses and end users to access information on the Web via their regular telephone service. The Voice eXtensible Markup Language (VXML) is being pushed by AT&T, Lucent Technologies and Motorola, which have created an industry forum to work toward standardizing the technology. Equipment and application developers may see the VXML standard as soon as year's end."

Receive updates from Managing Editor, Robin Cover.

Document URI: http://xml.coverpages.org/vxml.html — Legal stuff
Robin Cover, Editor: robin@oasis-open.org