The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Created: September 28, 2004.
News: Cover StoriesPrevious News ItemNext News Item

W3C Publishes InkML and EMMA Working Drafts for the Multimodal Interaction Framework.

The W3C Multimodal Interaction Working Group has published revised Working Drafts for EMMA: Extensible MultiModal Annotation Markup Language and Ink Markup Language as part of the W3C Multimodal Interaction Activity.

The W3C Multimodal Interaction Activity involves technical work to extend the Web user interface "to allow multiple modes of interaction (aural, visual and tactile), offering users the means to provide input using their voice or their hands via a key pad, keyboard, mouse, or stylus. For output, users will be able to listen to spoken prompts and audio, and to view information on graphical displays. The Multimodal Interaction Working Group is producing specifications intended to be implementable on a royalty-free basis."

Applications implementing the W3C Multimodal specifications "are of particular interest for mobile devices: speech offers a welcome means to interact with smaller devices, allowing one-handed and hands-free operation. The Working Group has also worked on integration of composite multimodal input; dynamic adaptation to device configurations, user preferences and environmental conditions; modality component interfaces; and a study of current approaches to interaction management."

The Ink Markup Language "serves as the data format for representing ink entered with an electronic pen or stylus. The markup allows for the input and processing of handwriting, gestures, sketches, music and other notational languages in Web-based (and non Web-based) applications. It provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules."

The third public version of the Ink Markup Language describes the syntax and semantics for the markup language to represent data produced by pen interfaces. It "provides a simple and platform-neutral data format to promote the interchange of digital ink between software applications."

The updated EMMA: Extensible MultiModal Annotation Markup Language is also part of the W3C's set of specifications for multi-modal systems designed to enable access to the Web using multi-modal interaction. EMMA "provides details of an XML markup language for describing the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers."

Specifications from the W3C Multimodal Interaction Working Group are intended to be used in a variety of settings using "GUIs, speech, vision, pen, gestures, and haptic interfaces." Acording to W3C's vision for multimodal access, the different modalities "may be supported on a single device or on separate devices working in tandem, for example, you could be talking into your cellphone and seeing the results on a PDA. Voice may also be offered as an adjunct to browsers with high resolution graphical displays, providing an accessible alternative to using the keyboard or screen. This can be especially important in automobiles or other situations where hands and eyes free operation is essential. Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller. It is much easier to say a few words than it is to thumb them in on a keypad where multiple key presses may be needed for each character."

Bibliographic Information

Overview of InkML

"As more electronic devices with pen interfaces have and continue to become available for entering and manipulating information, applications need to be more effective at leveraging this method of input. Handwriting is an input modality that is very familiar for most users since everyone learns to write in school. Hence, users will tend to use this as a mode of input and control when available.

A pen-based interface is enabled by a transducer device and a pen that allow movements of the pen to be captured as digital ink. Digital ink can be passed on to recognition software that will convert the pen input into appropriate computer actions. Alternatively, the handwritten input can be organized into ink documents, notes or messages that can be stored for later retrieval or exchanged through telecommunications means. Such ink documents are appealing because they capture information as the user composed it, including text in any mix of languages and drawings such as equations and graphs.

Hardware and software vendors have typically stored and represented digital ink using proprietary or restrictive formats. The lack of a public and comprehensive digital ink format has severely limited the capture, transmission, processing, and presentation of digital ink across heterogeneous devices developed by multiple vendors. In response to this need, the Ink Markup Language (InkML) provides a simple and platform-neutral data format to promote the interchange of digital ink between software applications.

InkML supports a complete and accurate representation of digital ink. For instance, in addition to the pen position over time, InkML allows recording of information about transducer device characteristics and detailed dynamic behavior to support applications such as handwriting recognition and authentication. For example, there is support for recording additional channels such as pen tilt, or pen tip force — often referred to as pressure in manufacturers' documentation.

InkML provides means for extension. By virtue of being an XML-based language, users may easily add application-specific information to ink files to suit the needs of the application at hand.

Uses of InkML: With the establishment of a non-proprietary ink standard, a number of applications, old and new, are expanded where the pen can be used as a very convenient and natural form of input. Here are a few examples.

  • Ink Messaging: Two-way transmission of digital ink, possibly wireless, offers mobile-device users a compelling new way to communicate. Users can draw or write with a pen on the device's screen to compose a note in their own handwriting. Such an ink note can then be addressed and delivered to other mobile users, desktop users, or fax machines. The recipient views the message as the sender composed it, including text in any mix of languages and drawings.

  • Ink and SMIL: A photo taken with a digital camera can be annotated with a pen; the digital ink can be coordinated with a spoken commentary. The ink annotation could be used for indexing the photo — for example, one could assign different handwritten glyphs to different categories of pictures.

  • Ink Archiving and Retrieval: A software application may allow users to archive handwritten notes and retrieve them using either the time of creation of the handwritten notes or the tags associated with keywords. The tags are typically text strings created using a handwriting recognition system.

  • Electronic Form-Filling: In support of natural and robust data entry for electronic forms on a wide spectrum of keyboardless devices, a handwriting recognition engine developer may define an API that takes InkML as input.

  • Pen Input and Multimodal Systems: Robust and flexible user interfaces can be created that integrate the pen with other input modalities such as speech. Higher robustness is achievable because cross-modal redundancy can be used to compensate for imperfect recognition on each individual mode. Higher flexibility is possible because users can choose the most appropriate from among various modes for achieving a task or issuing commands. This choice might be based on user preferences, suitability for the task, or external conditions. For instance, when noise in the environment or privacy is a concern, the pen modality is preferred over voice..." [from the 28-September-2004 Working Draft]

Introduction to EMMA

The W3C Working Draft document presents an XML specification for EMMA, an Extensible MultiModal Annotation markup language, responding to the requirements documented in W3C Requirements for EMMA. This markup language is intended for use by systems that provide semantic interpretations for a variety of inputs, including but not necessarily limited to, speech, natural language text, GUI and ink input.

It is expected that this markup will be used primarily as a standard data interchange format between the components of a multimodal system; in particular, it will normally be automatically generated by interpretation components to represent the semantics of users' inputs, not directly authored by developers.

The language is focused on annotating the interpretation information of single and composed inputs, as opposed to (possibly identical) information that might have been collected over the course of a dialog.

The language provides a set of elements and attributes that are focused on accurately representing annotations on the input interpretations.

An EMMA document can be considered to hold three types of data:

  • Instance Data: Application-specific markup corresponding to input information which is meaningful to the consumer of an EMMA document. Instances are application-specific and built by input processors at runtime. Given that utterances may be ambiguous with respect to input values, an EMMA document may hold more than one instance.

  • Data Model: Constraints on structure and content of an instance. The data model is typically pre-established by an application, and may be implicit, that is, unspecified.

  • Metadata: Annotations associated with the data contained in the instance. Annotation values are added by input processors at runtime.

Uses of EMMA: The general purpose of EMMA is to represent information automatically extracted from a user's input by an interpretation component, where input is to be taken in the general sense of a meaningful user input in any modality supported by the platform... Components that generate EMMA markup:

  • Speech recognizers
  • Handwriting recognizers
  • Natural language understanding engines
  • Other input media interpreters (e.g., DTMF, pointing, keyboard)
  • Multimodal integration component

Components that use EMMA include:

  • Interaction manager
  • Multimodal integration component

Although not a primary goal of EMMA, a platform may also choose to use this general format as the basis of a general semantic result that is carried along and filled out during each stage of processing. In addition, future systems may also potentially make use of this markup to convey abstract semantic content to be rendered into natural language by a natural language generation component..." [from the Working Draft 2004-09-01]

Multimodal Applications in Daily Life

The W3C Multimodal Interaction Working Group should be of interest to a range of organizations in different industry sectors:

  • Mobile: Multimodal applications are of particular interest for mobile devices. Speech offers a welcome means to interact with smaller devices, allowing one-handed and hands-free operation. Users benefit from being able to choose which modalities they find convenient in any situation. The Working Group should be of interest to companies developing smart phones and personal digital assistants or who are interested in providing tools and technology to support the delivery of multimodal services to such devices.

  • Automotive and Telematics: With the emergence of dashboard integrated high resolution color displays for navigation, communication and entertainment services, W3C's work on open standards for multimodal interaction should be of interest to companies working on developing the next generation of in-car systems.

  • Multimodal interfaces in the office: Multimodal has benefits for desktops and wall mounted interactive displays, offering a richer user experience and the chance to use speech and pens as alternatives to the mouse and keyboard. W3C's standardization work in this area should be of interest to companies developing browsers and authoring technologies, and who wish to ensure that the resulting standards live up to their needs.

  • Multimodal interfaces in the home: In addition to desktop access to the Web, multimodal interfaces are expected to add value to remote control of home entertainment systems, as well as finding a role for other systems around the home. Companies involved in developing embedded systems and consumer electronics should be interested in W3C's work on multimodal interaction. [Multimodal Interaction Activity]

Principal References

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: