[This local archive copy is from the official and canonical URL, http://www.vxmlforum.org/tech_bkgrnd.html; please refer to the canonical source document if possible.]

[VXML Forum] [Press Kit]
Contact Us Membership Specifications Press Kit Supporters Forum Goals VXML Home

Voice eXtensible Markup Language (VXML) - Technical Background

The VXML Approach

Most people are familiar with automated telephone services. These services allow users to retrieve information such as bank balances, flight schedules, and movie show times from any telephone. The explosive growth of the Internet and World Wide Web technologies has shifted the landscape for providers of traditional phone services to a new set of customers accessing information and services through the Web. While in most cases customers still access automated services through the phone, providers are finding it easier to build new services that exploit the power of Web technology.

VXML provides the best of both worlds. Providers, by expressing automated voice services using a markup language like VXML, can open up their new Web services to customers using voice interfaces, such as the telephone. Developers can build automated voice services using exactly the same technology they use to create visual Web sites, significantly reducing the cost of construction and delivery of new capabilities for the traditional phone customer.


VXML has its roots in a research project called PhoneWeb at AT&T Bell Laboratories. After the AT&T/Lucent split, both companies pursued development of independent versions of a phone markup language.

Lucent's Bell Labs continued work on the project, now known as TelePortal. The recent research focus has been on service creation and natural language applications.

AT&T Labs has built a mature phone markup language and platform that have been used to construct many different types of applications, ranging from call center-style services to consumer telephone services that use a visual Web site for customers to configure and administer their telephone features. AT&T's intent has been twofold. First, it wanted to forge a new way for its business clients to construct call center applications with AT&T-provided network call handling. Second, AT&T wanted a new way to build and quickly deploy advanced consumer telephone services, and in particular define new ways in which third parties could participate in the creation of new consumer services.

Motorola embraced the markup approach as a way to provide mobile users with up-to-the-minute information and interactions. Given the corporate focus on mobile productivity, Motorola's efforts focused on hands-free access. This led to an emphasis on speech recognition rather than touch-tones as an input mechanism. Also, by starting later, Motorola was able to base its language on the recently-developed XML framework. These efforts led to the October 1998 announcement of the VoxML™ technology. Since the announcement, thousands of developers have downloaded the VoxML language specification and software development kit.

There has been growing interest in this general concept of using a markup language to define voice access to Web-based applications. For several years Netphonic has had a product known as Web-on-Call that used an extended HTML and software server to provide telephone access to Web services; in 1998, General Magic acquired Netphonic to support Web access for phone customers. In October 1998, the World Wide Web Consortium (W3C) sponsored a workshop on Voice Browsers. A number of leading companies, including AT&T, IBM, Lucent, Microsoft, Motorola, and Sun, participated.

Some systems, such as Vocalis' SpeecHTML, use a subset of HTML, together with a fixed set of interaction policies, to provide interactive voice services.

Most recently, IBM has announced SpeechML, which provides a markup language for speech interfaces to Web pages; the current version provides a speech interface for desktop PC browsers.

The VXML Forum will explore public domain ideas from existing work in the voice browser arena, and where appropriate include these in its final proposal. As the standardization process for voice browsers develops, the VXML Forum will work with others to find common ground and the right solution for business needs.

Advantages of Voice Markup Content

Content providers today are faced with a dilemma: should they provide services only on the World Wide Web, or should they also provide telephone access to their applications? Today, there are significant hardware and integration costs involved in deploying a telephone service. Typically, a provider would need to purchase telephony hardware, develop an application in a proprietary application programming interface, and integrate the application with existing databases. With voice markup, a content provider could develop a voice application using many of the same Web-development tools its programmers already are familiar with, publish the application on an existing Web server, and arrange with a service provider to handle VXML interpretation. Because established Web technologies are used, the integration with back-end databases can be shared with the HTML application. Because the development of the application is separated from its deployment, the content provider will have much more flexibility in deploying the application. For example, one option would be to contract with a service provider until the voice application had proven its worth, and later purchase a VXML platform to own and operate.

Consumers also can realize many advantages from a standardized voice markup language. First, the ease of deploying new voice applications with the markup approach promises to expand the range of applications accessible from the telephone. Furthermore, once a large number of services are available via the Internet, it becomes possible to interact with several unrelated services during a single phone call. In essence, individual consumers could have "Voice ISP" service in addition to, or included in, their traditional data ISP services.

Combining for the Future

Leveraging the best aspects of AT&T and Lucent phone markup languages and Motorola's VoxML technology, together with the VXML Forum's large collection of supporters and contributors, is expected to yield an open, broadly applicable voice markup language standard for all to use.

The end result will have the telephony features needed to build sophisticated interactive voice services for business applications, such as call centers, as well as all of the functions needed to provide speech-driven interfaces to all manner of end users. VXML will help deliver voice services from the high-mobility worker on a cellular phone calling the company intranet to get information on a sales prospect to mom calling to get a weather report before sending the kids out for the day.

VXML will include conventional telephony input, output and call control features, including: touch-tone input, automatic speech recognition support, audio recording (e.g., for voice mail), the ability to play recordings (such as WAV files), speech synthesis from plain or annotated text, call transfer, conferencing, and other advanced call management features. As an XML-based definition with an HTML-like appearance, VXML will be easy to learn for experienced Web content programmers and amenable to easy processing by tools to support desktop development of VXML Web applications.


AT&T, Lucent, and Motorola have built applications illustrating the strengths of the VXML approach, based on their previous work.

AT&T and its business customers have built several examples of typical automated business applications: customer surveys, telephone e-commerce services, product promotion, recipe browsing and delivery, frequently asked question services. AT&T has also built a full consumer telephone service based on its work, which included contributions from business partners for weather, news, and stock market data. AT&T has also constructed many other prototype consumer services such as prepaid calling card and universal messaging.

Lucent has demonstrated the use of the markup language approach to create banking and other e-commerce services, a variety of information retrieval services, and interactive communications services.

Motorola demonstrated a collection of mobile-productivity applications at its VoxML announcement from three early adopters of the technology: BizTravel.com, CBS Marketwatch, and The Weather Channel. Other active areas of application development include e-commerce, consumer self-service, local events information, and corporate intranet information access.