W3C Recommendation: Character Model of the World Wide Web - Fundamentals

World Wide Web Consortium Issues Critical Internationalization Recommendation

"Character Model of the World Wide Web: Fundamentals" Brings Unified Approach to Using Characters on the Web

http://www.w3.org/ — 15-February-2005

The World Wide Web Consortium (W3C) has published the Character Model of the World Wide Web: Fundamentals as a W3C Recommendation. It provides a well-defined and well-understood way for Web applications to transmit and process the characters of the world's languages.

This architectural Recommendation gives authors of specifications, software developers, and content developers a common reference, enabling interoperable text manipulation on the World Wide Web. It builds on the Universal Character Set, defined jointly by the Unicode Standard and ISO/IEC 10646. Topics include use of the terms 'character', 'encoding' and 'string', a reference processing model, choice and identification of character encodings, character escaping, and string indexing.

The goal of the Character Model for the World Wide Web is to facilitate use of the Web by all people, regardless of their language, script, writing system, and cultural conventions, in accordance with the W3C goal of universal access.

Unicode Brings the Universal Character Set to the Web

At the core of the character model is the Universal Character Set (UCS). The model allows Web technologies to support text in the world's scripts (and on different platforms) and to be exchanged, read, and searched by Web users around the world. Unicode was chosen because it provides a way of referencing characters independent of the encoding of the text, it is being updated and completed carefully, and it is widely accepted and implemented by industry.

W3C adopted Unicode as the document character set for HTML in HTML 4.0. The same approach was later used for Recommendations such as XML 1.0 and CSS Level 2. W3C specifications and applications now use Unicode as the common reference character set.

New Specification Clarifies Character Usage on the Web

As the number of Web applications increases, the need for a shared character model has become more critical. Unicode is the natural choice as the basis for that shared model, especially as applications developers begin to consolidate their encoding options. However, applying Unicode to the Web requires additional specifications; this is the purpose of the W3C Character Model series.

Some aspects particular to the Web that receive more explanation in the series include:

Choice of Unicode encoding forms (UTF-8, UTF-16, UTF-32)
Counting characters, measuring string length in the presence of variable-length character encodings and combining characters
Duplicate encodings of characters (e.g., precomposed vs. decomposed)
Use of escape mechanisms to represent characters

Series Documents to Be Completed in 2005

Today's Recommendation is the first in a set of three documents. In development are "Character Model for the World Wide Web 1.0: Normalization," specifying early uniform normalization and string identity matching for text manipulation, and "Character Model for the World Wide Web 1.0: Resource Identifiers," specifying IRI conventions.

Industry Leaders Key in Development of Character Model Series

The Character Model was developed by the W3C Internationalization Activity's Working Group (now the W3C Internationalization Core Working Group) with the help of the W3C Internationalization Interest Group. W3C Members participating in the Working Group include BBC, Boeing, Ecole Mohammadia d'Ingénieurs, IBM, Microsoft, Siemens, Sun Microsystems, and webMethods.

About the World Wide Web Consortium (W3C)

The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users, and various prototype and sample applications to demonstrate use of new technology. More than 350 organizations are Members of W3C. To learn more, see http://www.w3.org/

Contacts

Contact Americas and Australia
Janet Daly
Email: <janet@w3.org>
Tel: +1.617.253.5884

Contact Europe, Africa and Middle East
Marie-Claire Forgue
Email: <mcf@w3.org>
Tel: +33.492.38.75.94

Contact Asia
Yasuyuki Hirakawa
Email: <chibao@w3.org>
Tel: +81.466.49.1170

This announcement is also available in French and Japanese.

Prepared by Robin Cover for The XML Cover Pages archive. See details in the news story "W3C and IETF Publish New Standards Supporting the Internationalized Web."

SEARCH Advanced Search ABOUT Site Map CP RSS Channel Contact Us Sponsoring CP About Our Sponsors NEWS Cover Stories Articles & Papers Press Releases CORE STANDARDS XML SGML Schemas XSL/XSLT/XPath XLink XML Query CSS SVG TECHNOLOGY REPORTS XML Applications General Apps Government Apps Academic Apps EVENTS LIBRARY Introductions FAQs Bibliography Technology and Society Semantics Tech Topics Software Related Standards Historic	W3C Recommendation: Character Model of the World Wide Web - Fundamentals World Wide Web Consortium Issues Critical Internationalization Recommendation "Character Model of the World Wide Web: Fundamentals" Brings Unified Approach to Using Characters on the Web http://www.w3.org/ — 15-February-2005 The World Wide Web Consortium (W3C) has published the Character Model of the World Wide Web: Fundamentals as a W3C Recommendation. It provides a well-defined and well-understood way for Web applications to transmit and process the characters of the world's languages. This architectural Recommendation gives authors of specifications, software developers, and content developers a common reference, enabling interoperable text manipulation on the World Wide Web. It builds on the Universal Character Set, defined jointly by the Unicode Standard and ISO/IEC 10646. Topics include use of the terms 'character', 'encoding' and 'string', a reference processing model, choice and identification of character encodings, character escaping, and string indexing. The goal of the Character Model for the World Wide Web is to facilitate use of the Web by all people, regardless of their language, script, writing system, and cultural conventions, in accordance with the W3C goal of universal access. Unicode Brings the Universal Character Set to the Web At the core of the character model is the Universal Character Set (UCS). The model allows Web technologies to support text in the world's scripts (and on different platforms) and to be exchanged, read, and searched by Web users around the world. Unicode was chosen because it provides a way of referencing characters independent of the encoding of the text, it is being updated and completed carefully, and it is widely accepted and implemented by industry. W3C adopted Unicode as the document character set for HTML in HTML 4.0. The same approach was later used for Recommendations such as XML 1.0 and CSS Level 2. W3C specifications and applications now use Unicode as the common reference character set. New Specification Clarifies Character Usage on the Web As the number of Web applications increases, the need for a shared character model has become more critical. Unicode is the natural choice as the basis for that shared model, especially as applications developers begin to consolidate their encoding options. However, applying Unicode to the Web requires additional specifications; this is the purpose of the W3C Character Model series. Some aspects particular to the Web that receive more explanation in the series include: Choice of Unicode encoding forms (UTF-8, UTF-16, UTF-32) Counting characters, measuring string length in the presence of variable-length character encodings and combining characters Duplicate encodings of characters (e.g., precomposed vs. decomposed) Use of escape mechanisms to represent characters Series Documents to Be Completed in 2005 Today's Recommendation is the first in a set of three documents. In development are "Character Model for the World Wide Web 1.0: Normalization," specifying early uniform normalization and string identity matching for text manipulation, and "Character Model for the World Wide Web 1.0: Resource Identifiers," specifying IRI conventions. Industry Leaders Key in Development of Character Model Series The Character Model was developed by the W3C Internationalization Activity's Working Group (now the W3C Internationalization Core Working Group) with the help of the W3C Internationalization Interest Group. W3C Members participating in the Working Group include BBC, Boeing, Ecole Mohammadia d'Ingénieurs, IBM, Microsoft, Siemens, Sun Microsystems, and webMethods. About the World Wide Web Consortium (W3C) The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users, and various prototype and sample applications to demonstrate use of new technology. More than 350 organizations are Members of W3C. To learn more, see http://www.w3.org/ Contacts Contact Americas and Australia Janet Daly Email: <janet@w3.org> Tel: +1.617.253.5884 Contact Europe, Africa and Middle East Marie-Claire Forgue Email: <mcf@w3.org> Tel: +33.492.38.75.94 Contact Asia Yasuyuki Hirakawa Email: <chibao@w3.org> Tel: +81.466.49.1170 This announcement is also available in French and Japanese. Prepared by Robin Cover for The XML Cover Pages archive. See details in the news story "W3C and IETF Publish New Standards Supporting the Internationalized Web."