The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: August 09, 2000
Java Speech Markup Language (JSML/JSpeech)

[June 20, 2000] Sun Microsystems recently submitted to the W3C a document on the JSpeech Markup Language. Reference: W3C Note 05-June-2000, by Andrew Hunt (Speech Works International). Document abstract: "The JSpeech Markup Language (JSML) is a text format used by applications to annotate text input to speech synthesizers. JSML elements provide a speech synthesizer with detailed information on how to 'speak' text and thus enable improvements in the quality, naturalness and understandability of synthesized speech output. JSML defines elements that describe the structure of a document, provide pronunciations of words and phrases, indicate phrasing, emphasis, pitch and speaking rate, and control other important speech characteristics. JSML is designed to be simple to learn and use, to be portable across different synthesizers and computing platforms, and to applicable to a wide range of languages." Detail: "JSML defines a specific set of elements to mark up text to be spoken, and defines the interpretation of those elements so that there is a common understanding between synthesizers and documents producers of how marked up text will be spoken. The JSML element set includes several types of element. First, JSML documents can include structural elements that mark paragraphs and sentences. Second, there are JSML elements to control the production of synthesized speech, including the pronunciation of words and phrases, the emphasis of words (stressing or accenting), the placements of boundaries and pauses, and the control of pitch and speaking rate. Finally, JSML includes elements that represent markers embedded in text and that enable synthesizer-specific controls." The submitted document "is derived from the Java Speech API Markup Language specification (Version 0.6, October, 1999), which is available from Sun Microsystems's web site. Sun Microsystems wishes to submit this document for consideration by the W3C Voice Browser Working Group towards the development of internet standards for speech technology. We expect the resulting W3C recommendations to be of great importance to the developer community." See also the W3C Submission Request and the W3C Staff Comment). In this connection, see the companion Sun specification for a JSpeech Grammar Format, also published as a W3C NOTE. The JSpeech Grammar Format (JSGF) "is a platform-independent, vendor-independent textual representation of grammars for use in speech recognition. Grammars are used by speech recognizers to determine what the recognizer should listen for, and so describe the utterances a user may say. JSGF adopts the style and conventions of the Java Programming Language in addition to use of traditional grammar notations... The JSpeech Grammar Formatuses a textual representation that is readable and editable by both developers and computers, and can be included in source code. The other major grammar type, the dictation grammar, is not discussed in this [JSGF] document." Sun has submitted this document for consideration by the W3C Voice Browser Working Group.

[1999] "The Java Speech Markup Language (JSML) allows applications to annotate text with additional information that can improve the quality and naturalness of synthesized speech. JSML documents can include structural information about paragraphs and sentences. JSML allows control of the production of synthesized speech, including the pronunciation of words and phrases, the emphasis of words (stressing or accenting), the placements of boundaries and pauses, and the control of pitch and speaking rate. Finally, JSML allows markers to be embedded in text and allows synthesizer-specific controls."

"JSML is a subset of XML (Extensible Markup Language), which is a simple dialect of SGML. By being a subset of XML, JSML gains a standardized, extensible syntax that is not tied to the Java Speech API (JSAPI). This means that: (1) JSML is readable and editable by both humans and computers; (2) General XML editors can be used to simplify writing JSML; (3) JSML markup is very regular and easy for a synthesizer to parse; (4) Text containing JSML can be prepared by hand using non-JSAPI-specific editors."

"Sun Microsystems, Inc. has worked in partnership with leading speech technology companies to define the specifications for the Java Speech API and the Java Speech Markup Language (JSML). These companies bring decades of research on speech technology and experience in the development and use of speech applications. Sun is grateful for the contributions of: Apple Computer, Inc., AT&T, Dragon Systems, Inc., IBM Corporation, Novell, Inc., Philips Speech Processing, Texas Instruments Incorporated.

[July 11, 2000] The first public working draft version of a 'Speech Recognition Grammar Specification' has been issued by the W3C Voice Browser Working Group as part of the W3C Voice Browser Activity, viz.: Speech Recognition Grammar Specification for the W3C Speech Interface Framework. Reference: W3C Working Draft 10-July-2000, edited by Andrew Hunt (SpeechWorks International) and Scott McGlashan (PipeBeach). Abstract: "This document defines syntax for representating grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms. The W3C Voice Browser Working Group is seeking input on whether the final specification should include both forms or be narrowed to a specific form." Description: "This document defines the syntax for grammar representation. The grammars are intended for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms. The W3C Voice Browser Working Group is seeking input on whether the final specification should include both forms or be narrowed to a specific form. Augmented BNF syntax (ABNF): this is a plain-text (non-XML) representation which is similar to traditional BNF grammar and to many existing BNF-like representations commonly used in the field of speech recognition including the JSpeech Grammar Format from which this specification is derived. Augmented BNF should not be confused with Extended BNF which is used in DTDs for XML and SGML. XML: This syntax uses XML elements to represent the grammar constructs and adapts designs from the PipeBeach grammar (W3C Members only), TalkML and a research XML variant of the JSpeech Grammar Format. Section 5 outlines area of Future Study around Grammar representations for speech recognition. In addition to the decision about supporting an XML form, the ABNF form or both, the committee is currently considering a proposal for representing statistical language models -- specifically "n-grams" -- that are used in many speech recognition systems. The W3C Standard is known as the Speech Recognition Grammar Specification and is based upon the JSGF specification, which is owned by Sun Microsystems."

For related research and development, see (1) "SSML: A Speech Synthesis Markup Language," (2) "SABLE: A Standard for Text-to-Speech Synthesis Markup," and (3) "SpeechML."

References:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/jspeech.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org