Content language would offer data model of a site's structure
To borrow from one of Yogi Berra's famous quotes, the Web is in danger of becoming so crowded with data that no one will go there to find information. This is exacerbated by brute-force tools that can't extract information from text strings-for instance, a search that doesn't know the difference between the linguist's term for a shared vocabulary (the Dublin Core) and the downtown of a city in Ireland.
Meta Content Framework (MCF) is a Netscape proposal to the World Wide Web Consortium (W3C) designed to end all that.
With a small bootstrap vocabulary-a set of reserved words-MCF is meant to "be its own schema definition language [and therefore] dynamically extensible by author or by application." Practically, this means that site designers will be able to expose a model of their site's structure through the information about the information (metadata) that it contains.
An approach that could radically
alter the way information
is found on the Web.
Written by R.V. Guha of Netscape and Tim Bray of Textuality, the submission states, "If information about information can share a common data model and vocabulary, it will be possible to query and manage metadata to some degree, even without fully understanding it."
A bootstrap vocabulary of 15 terms is part of the proposal and is supplemented in an appendix with terms taken from standard sources such as the Dublin Core, a set of metadata elements created at a gathering of computer scientists, text markup specialists, and librarians. The syntax is the eXtensible Markup Language (XML); Bray is both the co-editor of the nascent XML specification and Netscape's representative on the Editorial Review Board.
A major shortfall of earlier schemes for shared metadata has been their lack of extensibility-they required that all applications have foreknowledge of the specific meaning of terms. The core of the MCF proposal is this bootstrap methodology, using a set of common terms that allows an application (such as a browser) to learn the specific definitions of a site's metadata. While MCF has the potential to radically alter the way information is found on the Web, the raw markup itself is not for the timid.
The "X" in XML stands for extensibility, but inventing a new tag when no one else knows what to do with it is not much help. Using Knowledge Representation theory-a discipline that has emerged out of computer science-MCF sets up standard methods to decode the meaning of the new tag, using the same syntax as the rest of the document.
"Even if my browser cannot extract the full measure of the meaning of , it can do a lot with it," Guha said. "When we move from document layout languages to data interchange languages, this becomes very important."
Using MCF, a search engine or browser would be able to extract information about a site or a page by parsing an arbitrary collection of information about it. For instance, an MCF-based site map would be able to expose intrinsic properties of pages and elements within pages to direct a search engine.
In another example, a bookstore site could define areas on a page that relate to information about an author, a summary of a book, the ISBN number, publisher, etc. The search engine could parse a query for a specific author and know from the MCF data that author was a property of a book listing that also contains the other categories.
MCF defines an object as a node. Nodes express their properties (size, last revision date, etc.) as labels, although labels can also be nodes themselves. The relationships between nodes are denoted by connections called arcs.
According to Bray, "MCF is the first step in the process of turning networks from messy heaps of books into libraries.... [It] is a framework for meta-content-information about information-that allows sites to publish and share not just content, but guidance in finding and using that content."
According to the proposal, MCF can describe the structure of Web sites, threaded e-mail, PIM functions, distributed annotation and authoring, and commercial information such as prices and dates. MCF segments can point to a site index, describe a site map, and sharpen a broad-based search.
The MCF proposal also cites "sets of channels" as a possible area of application for MCF. While this may be seen as a shot across the bow of Microsoft's Channel Definition Format (CDF) submission, it also indicates that two companies are at least sailing on the same ocean in the same general direction, since both proposals use XML syntax for metadata description.
It is not yet clear how Microsoft will respond. The MCF submission uses the syntax of an unpublished proposal on name spaces jointly submitted to the W3C by Bray and Andrew Layman, a Microsoft senior program manager.
Significantly, while acknowledging the MCF submission, W3C staffer Ralph Swick said that XML is "gaining momentum as a general data-transfer syntax." Swick mentioned the PICS Label Syntax Working Group and the DSig Collections Working Group as two that are looking at XML for their metadata models.
Proposals floated less formally on the Web have included amalgamating XML with Electronic Data Interchange (EDI) or replacing EDI with XML for precisely the reason of lack of extensibility in the widely used EDI data exchange format for business transactions.
HTML pages will be able to use to point to associated MCF files. If adopted and implemented, MCF will confirm XML as the extensible HTML it is intended to be.
Until then, it's the submission by the company whose logo is on 66 percent of the Web browsers in use that signals that only the marketplace has yet to put its stamp of approval on the eXtensible Markup Language. With both Netscape and Microsoft backing major XML-based proposals, applications that put XML on every desktop cannot be far behind.