The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: March 23, 2004.
News: Cover StoriesPrevious News ItemNext News Item

Opera Multimodal Desktop Browser Supports XHTML+Voice (X+V) Specification.

An announcement from Opera Software at the AVIOS SpeechTEK International Exposition and Educational Conference describes the upcoming release of a multimodal desktop browser based on the XHTML+Voice (X+V) specification.

In 2001, IBM, Motorola, and Opera submitted the XHTML+Voice Profile 1.0 to W3C for standards work. The most recent version of XHTML+Voice Profile 1.2 is managed by the VoiceXML Forum, and "brings spoken interaction to standard web content by integrating the mature XHTML and XML-Events technologies with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific DOM events, thereby reusing the event model familiar to web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content."

The Opera multimodal browser project builds upon an ongoing relationship between IBM and Opera: the new release incorporates IBM's Embedded ViaVoice speech technology. IBM's ViaVoice speech technology supports a variety of real-time operating systems (RTOS) and microprocessors, powering mobile devices such as smart phones, handheld personal digital assistants (PDAs), and automobile components. "By leveraging IBM's voice libraries in this version of Opera, users can navigate, request information, and even fill in Web forms using speech and other forms of input in the same interaction."

The new platform allows users to "interact with the content on the Web in a more natural way, combining speech with other forms of input and output; developers can also start to build multimodal content using the open standards-based X+V markup language, which unifies the visual and voice Web by using development skills a large population of programmers already have today."

Opera Multimodal Browser Overview

"The multimodal browser being developed by IBM and Opera is based on the XHTML+Voice (X+V) specification. This project builds upon IBM's and Opera's ongoing relationship. In 2001, IBM, Motorola and Opera submitted the multimodal standard X+V to the standards body W3C. This mark-up language leverages existing standards, already familiar to voice and Web developers, so they can use their skills and resources to extend current applications instead of building new ones from the ground up.

Multimodal technology allows the interchangeable use of multiple forms of input and output, such as voice commands, keypads, or stylus — in the same interaction.

As computing moves away from keyboard-reliant PCs into devices such as handheld computers and cellular phones, multimodal technology becomes increasingly important. This technology will allow end users to interact with technology in ways that are most suitable to the situation. The Multimodal Browser allows viewing of and interacting with multimodal applications that have been built using X+V..." [from the product description]

From the Opera Software Announcement 2004-03-23

"Voice is the most natural and effective way we communicate. In the years to come it will greatly facilitate how we interact with technology," says Christen Krogh, VP Engineering, Opera Software ASA. "By making this technology available today for the wider Web audience, the serious work of voice-enabling the Web can commence."

While traditional HTML pages continue to be the foundation of the Web, advances in the function, speed and size of both computers and mobile devices, along with today's diversity of users, has increased the demand for more flexible user interfaces. By building on this standardized foundation using XHTML+Voice (X+V), developers can add voice input and output to traditional, graphically based Web pages and achieve natural voice functionality. For example, Opera's presentation tool, Opera Show, can empower users to replace Microsoft PowerPoint, creating light-weight, Internet standards-based presentations that can also make post-publishing a breeze. By combining Opera Show with voice, can allow users in the future be able to give presentations and tell Opera via voice commands to turn to the next slide without approaching the computer and pressing the 'Page Down' keyboard key.

"This new offering can allow us to interact with the content on the Web in a more natural way, combining speech with other forms of input and output — first on PCs, and in the near future, devices such as cellphones and PDAs," said Igor Jablokov, Director, Embedded Speech, IBM, and Chairman of the VoiceXML Forum. "Developers can also start to build multimodal content using the open standards-based X+V markup language, which unifies the visual and voice Web by using development skills a large population of programmers already have today."

Opera will make the IBM integrated voice browser available in English for Windows with initial targets being enterprise customers and developers.

Opera Software ASA is an industry leader in the development of Web browser technology, targeting the desktop, smartphone, PDA, iTV and vertical markets. Partners include companies such as IBM, Nokia, Sony, Motorola, Macromedia, Adobe, Symbian, Canal+ Technologies, Sony Ericsson, Kyocera, Sharp, Motorola Metroworks, MontaVista Software, BenQ, Sendo and AMD. The Opera browser has received international recognition from users, industry experts and media for being faster, smaller and more standards-compliant than other browsers.

Opera's browser technology is cross-platform and modular, and currently available on the following operating systems: Windows, Linux, Mac OS, Symbian OS, QNX, TRON, FreeBSD, Solaris and Mediahighway.

Opera Software ASA is headquartered in Oslo, Norway, with development centers in Linkoping and Gothenburg, Sweden, and a sales representative in Austin, TX. The company is listed on the Oslo Børs under the ticker symbol OPERA..."

Bibliographic Information: XHTML+Voice Profile (Versions 1.0, 1.1, 1.2)

  • XHTML+Voice Profile 1.2. 16-March-2004. Edited by Jonny Axelsson (Opera Software), Chris Cross (IBM), Jim Ferrans (Motorola), Gerald McCobb (IBM), T. V. Raman (IBM), and Les Wilson (IBM). VoiceXML Forum. Version URL: http://www.voicexml.org/specs/multimodal/x+v/12/spec.html. Previous version URL: http://www.ibm.com/software/pervasive/multimodal/x+v/11/spec.htm.

    "The XHTML+Voice profile brings spoken interaction to standard web content by integrating the mature XHTML and XML-Events technologies with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, and speech grammars. Voice handlers can be attached to XHTML elements and respond to specific DOM events, thereby reusing the event model familiar to web developers. Voice interaction features are integrated with XHTML and CSS and can consequently be used directly within XHTML content.

  • XHTML+Voice Profile 1.1. 28-January-2003. From the IBM web site. The XHTML+Voice Profile 1.0 specification contributed to W3C in November/December 2001 (see following reference) was submitted on RAND terms, as clarified by the W3C Staff Comment: "The IPR declarations provided with the submission reveal that both IBM and Motorola may own patents or patent applications that apply to the XHTML+Voice submission. Both companies state that they are prepared to offer a non-exclusive license under such patents on reasonable and non-discriminatory (RAND) terms." The Staff Comment includes an update as follows: "An updated version of XHTML+Voice (v 1.1) was contributed to the Voice Browser and Multimodal Interaction Working Groups on 11th March 2003. Both Motorola and IBM have revised their IPR disclosures, agreeing to provide a nonexclusive royalty-free licence for any related patent claims they may have." [cache]

  • XHTML+Voice Profile 1.0. W3C Note. 21-December-2001. Edited by Jonny Axelsson (Opera Software), Chris Cross (IBM), Håkon W. Lie (Opera Software), Gerald McCobb (IBM), T. V. Raman (IBM), and Les Wilson (IBM). Version URL: http://www.w3.org/TR/2001/NOTE-xhtml+voice-20011221. Latest Version URL: http://www.w3.org/TR/xhtml+voice.

    Profile XHTML+Voice brings spoken interaction to standard WWW content by integrating a set of mature WWW technologies such as XHTML and XML Events with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, speech grammars, and the ability to attach Voice handlers for responding to specific DOM events, thereby re-using the event model familiar to web developers. Voice interaction features are integrated directly with XHTML and CSS, and can consequently be used directly within XHTML content.

    The XHTML+Voice profile is designed for Web clients that support visual and spoken interaction. To this end, this document first re-formulates VoiceXML 2.0 as a collection of modules. These modules, along with Speech Synthesis Markup Language and Speech Recognition Grammar Format are then integrated with XHTML using XHTML modularization to create the XHTML+Voice profile. Finally, we integrate the result with module XML-Events so that voice handlers can be invoked through a standard DOM2 EventListener interface.

About the XHTML+Voice Profile Version 1.2

The XHTML+Voice Profile 1.2 document defines version 1.2 of the XHTML+Voice profile. XHTML+Voice 1.2 is a member of the XHTML family of document types, as specified by XHTML Modularization. XHTML is extended with a modularized subset of VoiceXML 2.0, the XML Events module, and a module containing a small number of attribute extensions to both XHTML and VoiceXML. The latter module facilitates the sharing of multimodal input data between the VoiceXML dialog and XHTML input and text elements.

The XML Events module provides XML host languages the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 event interfaces. The result is an event syntax for XHTML-based languages that enables an interoperable way of associating behaviors with document-level markup.

VoiceXML has been designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. In this document, VoiceXML 2.0 is modularized to prepare it for integration into the XHTML family of languages using the XHTML modularization framework. The modules that combine to support speech dialogs for updating XHTML forms and form elements are selected to be added to XHTML. The modules are described as well as the integration issues. The modularization of VoiceXML 2.0 also specifies DOM event types specific to voice interaction for use with the XML Events module. Speech dialogs authored in VoiceXML 2.0 can then be treated as event handlers to add voice-interaction specific behaviors to XHTML documents. The language integration supports all of the modules defined in XHTML Modularization, and adds speech interaction functionality to XHTML elements to enable multimodal applications. The document type defined by the XHTML+Voice profile is XHTML Host language document type conformant.

Two mature technologies, XHTML 1.1 and VoiceXML 2.0 are integrated using XHTML Modularization to bring spoken interaction to the visual web. The design leverages open industry APIs like the W3C DOM to create interoperable web content that can be deployed across a variety of end-user devices. Multiple modes of interaction are synchronized and integrated using the DOM 2 Events model and exposed to the content author via XML Events.

Today, web applications are authored in XHTML with user interaction created via XHTML form elements. The W3C is presently working on XForms, the next generation of web forms that bring the power of XML to web application development. The combination of XHTML and Voice described in this document can leverage the semantic richness of web applications created using XForms, while providing a smooth transition for today's developers wishing to deploy multimodal applications by adding spoken interaction to present-day web content. Integrating the work of the W3C voice browser working group into mainstream XHTML content has the advantage of ensuring that future enhancements to the voice browser component such as natural language understanding will be incorporated. Thus, a smooth transition path for web developers wishing to deliver increasingly smart user interaction for their web applications is provided. Building on XHTML Basic and XHTML modularization, content developers will be able to deploy their content to a wide variety of end-user clients ranging from mobile phones and small PDAs to desktop browsers..." [from the version 1.2 specification, 16-March-2004]

Related Background Articles

  • "X+V is a Markup Language, Not a Roman Math Expression. Introducing XHTML + Voice: IBM's Proposal to the W3C on Developing Multimodal UIs." By Les Wilson (Senior Technical Staff Member, IBM Corp., Pervasive Computing Division). From IBM developerWorks. 19-August-2003. "X+V (XHTML plus Voice) is a Web markup language for developing multimodal applications. Like VoiceXML, X+V meets the increasing user demand for voice-based interaction in small and mobile devices. Unlike VoiceXML, X+V uses both voice and visual elements, bringing a world of new potential to the field of wireless user interface development. In this article, IBM Multimodal Architect Les Wilson provides a complete introduction to X+V, including a conceptual overview of multimodal interface development, an architectural view of the three components that comprise X+V's core functionality, and a code example that demonstrates the utility of this promising new markup language." See the sidebar, SALT versus X+V.

  • "Versatile Multimodal Solutions. The Anatomy of User Interaction." By T.V. Raman, Gerald McCobb, and Rafah A. Hosn. In XML Journal Volume 4 Issue 4 (April 2003). "This article describes X+V 1.1, an update to X+V that integrates the results of more than two years of experience gained by implementing multimodal solutions using this framework. It summarizes the additions to X+V and illustrates their use in creating multimodal interaction that leverages mixed-initiative VoiceXML dialogs. Formal descriptions of these additions can be found in the X+V 1.1 specification; here we'll focus on motivating these additions and explaining their use."

  • "IBM Delivers Free Speech Tools. WebSphere Everyplace Uses No SALT to Put Linux Where Your Mouth Is." By Edward J. Correia. In Software Development Times (August 15, 2003). "In the shadow of The SCO Group's copyright infringement allegations, IBM Corp. continues to develop and market solutions for Linux-based systems. IBM has released WebSphere Everyplace Multimodal Environment for Embedix, a version of its Eclipse-based IDE that developers can use to target Sharp's Linux-based Zaurus 5600 handheld computer with applications that can be driven visually, by voice or by a combination of the two. The tools include a code editor for modifying existing Web applications with XHTML and VoiceXML (X+V) tagging protocols, predeveloped X+V sample code and a voice-application simulator based on the Opera 7 voice-enabled browser from Opera Software AS. The environment works with IBM's Multimodal Toolkit for WebSphere Studio, also available now. Both tools are free..."

  • "Motorola Licenses Opera Browser for Phones. Opera Likely to be Incorporated into Models Based on the Linux and Symbian OSes." By Gillian Law. In InfoWorld (February 11, 2004). "Motorola Inc.'s Personal Communications Sector (PCS) division has signed a licensing agreement with Opera Software ASA to use the Oslo company's browser on its phones. The Opera browser's code is small enough to fit on a mobile phone, [Lars] Boilesen said. 'So you can go to any site on the Internet and browse it. Small-screen rendering technology reformats the content on the fly so that it fits. There's no horizontal scrolling, just up and down.' The licensing agreement also allows Motorola to offer the Opera Platform to telecommunications carriers. The platform is designed to integrate online content with a device's own applications, allowing an operator to update the content on the screen of a user's handset. The companies announced last week that they are working together to combine the WAP (Wireless Application Protocol) software stack from a browser developed by Motorola's Global Software Group with Opera's HTML browser software; that will allow phones to access the many WAP-based sites and services that have already been developed..."

Related News: Voice Extensible Markup Language (VoiceXML) 2.1

Following on the W3C announcement for VoiceXML 2.0 and Speech Recognition Grammar as W3C Recommendations, W3C has announced the release of a first public working draft for Voice Extensible Markup Language (VoiceXML) 2.1. VoiceXML 2.1 "specifies a set of eight (8) features commonly implemented by Voice Extensible Markup Language platforms. The specification is designed to be fully backwards-compatible with VoiceXML 2.0. The popularity of VoiceXML 2.0 spurred the development of numerous voice browser implementations early in the specification process. VoiceXML 2.0 has been phenomenally successful in enabling the rapid deployment of voice applications that handle millions of phone calls every day. This success has led to the development of additional, innovative features that help developers build even more powerful voice-activated services. While it was too late to incorporate these additional features into VXML2, the purpose of VoiceXML 2.1 is to formally specify the most common features to ensure their portability between platforms and at the same time maintain complete backwards-compatibility with VXML2..."

According to Dave Raggett, "The new features include using computed expressions for referencing grammars and scripts, the ability to detect where barge-in occurred within a prompt, greater convenience in prompting for dynamic lists of values, to be able to download data without having to move to the next page, to record the user's speech during recognition for later analysis, to pass data with a disconnect, and enhanced control over transfer..."

Bibliographic information: Voice Extensible Markup Language (VoiceXML) 2.1. W3C Working Draft 23 -March-2004. Edited by Matt Oshry, Tellme Networks (Editor-in-Chief); Paolo Baggia, Loquendo; Michael Bodell, Tellme Networks; David Burke, Voxpilot Ltd.; Daniel C. Burnett, Nuance Communications; Emily Candell, Comverse; Jim Ferrans, Motorola; Jeff Haynie, Vocalocity; Hakan Kilic, Scansoft; Jeff Kusnitz, IBM; Scott McGlashan, Hewlett-Packard; Rob Marchand, VoiceGenie; Michael Migdol, BeVocal, Inc.; Brad Porter, Tellme Networks; Ken Rehor, Vocalocity; Laura Ricotti, Loquendo. Version URL: http://www.w3.org/TR/2004/WD-voicexml21-20040323/. Latest version URL: http://www.w3.org/TR/voicexml21/.

Principal references:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2004-03-23-b.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org