[This local archive copy mirrored from the canonical site: http://www.objs.com/OSA/wom.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]

Towards a Web Object Model

Frank Manola
Object Services and Consulting, Inc. (OBJS)
fmanola@objs.com
10 February 1998

Abstract

Today, the World Wide Web is a global information repository of resources primarily consisting of syntactically-structured HTML documents and MIME-typed files. These relatively unstructured data models do not provide the foundation for command and control situation modeling or enterprise computing, or for a new generation of tools to operate on a more semantically-structured, knowledge-based web. Richer base data model(s) are needed that converge the benefits of emerging Web structuring mechanisms and distributed object service architectures.

A number of ongoing activities are attempting to merge aspects of object models with those of the World Wide Web. This paper describes a number of these activities, with particular emphasis on those which focus on providing enhanced facilities for representing metadata for describing Web (and other) resources. The intent of this paper is to:

describe key examples of existing work from the Web, database, and OMG communities that contribute both ideas and technology toward providing the components of a Web object model
identify some key underlying principles behind this work
identify a framework which allows this work to be unified and extended to support the requirements of advanced Web applications for object technology

1. Introduction

1.1 Background

1.2 Capabilities Provided by an Object Service Architecture

1.3 Increasing the Structuring Power of the Web

2. Relevant Work

2.1 Structured Data Representations and "Lightweight Object Models"

2.1.1 Summary Object Interchange Format (SOIF)

2.1.2 Object Exchange Model (OEM)

2.1.3 Knowledge Interchange Format (KIF)

2.1.4 Extensible Markup Language (XML)

2.2 Higher-Level Models and Metadata

2.2.1 Dublin Core

2.2.2 Warwick Framework

2.2.3 PICS and PICS-NG

2.2.4 XML-Data

2.2.5 Meta Content Framework (MCF)

2.2.6 Resource Description Framework (RDF)

2.3 Adding Behavior to Web Pages

2.3.1 Document Object Model (DOM)

2.3.2 Embedded Objects

2.3.3 Web Interface Definition Language

2.4 Related OMG Technologies

2.4.1 OMG Property Service

2.4.2 Tagged Data Facility

3. Building a Web Object Model

3.1 Integration Approach

3.2 Discussion

3.3 Formal Principles

3.3.1 Logic Basis

3.3.2 Representation of Higher Level Semantics

3.3.3 Object Logics

4. Conclusions

References

1. Introduction

1.1 Background

Many business and governmental organizations are planning or developing enterprise-wide, open distributed computing architectures to support their operational information processing requirements. Such architectures generally employ distributed object middleware technology, such as the Object Management Group's (OMG's) Common Object Request Broker Architecture (CORBA) [OMG95], as a basic infrastructure.

The use of objects in such architectures reflects the fact that advanced software development increasingly involves the use of object technology. This includes the use of object-oriented programming languages, class libraries and application development frameworks, application integration technology such as Microsoft's OLE, as well as distributed object middleware such as CORBA. It also involves the use of object analysis and design methodologies and associated tools.

This use of object technology is driven by a number of factors, including:

the desire to build software from reusable components
the desire for software to more directly and more completely reflect enterprise concepts, rather than information technology concepts
the need to support enterprise processes that involve legacy information systems
the inclusion of object concepts and facilities in key software products by major software vendors

The first two factors reflect requirements for business systems to be rapidly and cheaply developed or adapted to reflect changes in the enterprise environment, such as new services, altered internal processes, or altered customer, supplier, or other partner relationships. Object technology provides mechanisms, such as encapsulation and inheritance, that have the potential to support more rapid and flexible software development, higher levels of reuse, and the definition of software artifacts that more directly model enterprise concepts.

The third factor reflects a situation faced by many large organizations, in which a key issue is not just the development of new software, but the coordination of existing software that supports key internal processes and human activities. Mechanisms provided by object technology can help encapsulate existing systems, and unify them into higher-level processes.

The fourth factor is particularly important. It reflects the fact that, as commercial software vendors incorporate object concepts in key products, it will become more and more difficult to avoid using object technology. This is illustrated by the rapid pace at which object technology is being included in software such as DBMSs (including relational DBMSs) and other middleware, and client/server development environments. Due to this factor, organizations may be influenced to begin adopting object technology before they would ordinarily consider doing so.

At the same time, the Internet is becoming an increasingly important factor in planning for enterprise distributed computing environments. For example, companies are providing information via World Wide Web pages, as well as customer access via the Internet to such enterprise computing services as on-line ordering or order/service tracking facilities. Companies are also using Internet technology to create private Intranets, providing access to enterprise data (and, potentially, services) from throughout the enterprise in a way that is convenient and avoids proprietary network technology. Following this trend, software vendors are developing software to allow Web browsers to act as user interfaces to enterprise computing systems, e.g., to act as clients in workflow or general client/server systems. Products have also been developed that link mainframes to Web pages (e.g., translating conventional terminal sessions into HTML pages).

Organizations perceive a number of advantages in using the Web in enterprise computing. For example, Web browser software is widely available for most client platforms, and is cheaper than most alternative client applications. Web pages generally work reasonably well with a variety of browsers, and maintenance is simpler since the browser and associated software can reduce the amount of distributed software to be managed. In addition, the Web provides a representation for information which

supports interlinking of all kinds of content (text, voice, video, etc.)
is easy for end-users to access
is easy to create content for using widely-available tools

However, as organizations have attempted to employ the Web in increasingly-sophisticated applications, these applications have begun to overlap in complexity the sorts of distributed applications for which architectures such as OMG's CORBA, and its surrounding Object Management Architecture (OMA) [OMG97] were originally intended. Since the Web was not originally designed to support such applications, Web application development efforts increasingly run into limitations of the basic Web infrastructure. As a result, numerous efforts are being made to enhance Web capabilities, to enable them to support these more complex applications. In order to understand the missing elements, it is useful to look at the components of OMG's OMA.

1.2 Capabilities Provided by an Object Service Architecture

There is increasing agreement that modeling a distributed system as a distributed collection of interacting objects provides the appropriate framework for use in integrating heterogeneous, autonomous, and distributed (HAD) computing resources. Objects form a natural model for a distributed system because, like objects, distributed components can only communicate with each other using messages addressed to well-defined interfaces, and components are assumed to have their own locally-defined procedures enabling them to respond to messages sent them. Objects accommodate the heterogeneous aspects of such systems because messages sent to distributed components depend only on the component interfaces, not on the internals of the components. Objects accommodate the autonomous aspects of such systems because components may change independently and transparently, provided their interfaces are maintained. These characteristics allow objects to be used both in the development of new components, and for encapsulating access to legacy components. In addition, because object-oriented implementations bundle data with related operations in modular units, the use of objects provides the possibility of fine-grained tuning in the computing architecture by moving or copying objects to appropriate nodes of the network (this is becoming increasingly feasible with the development of technology such as Sun's Java).

OMG's Object Management Architecture (OMA) is an example of a distributed object architecture intended to support distributed enterprise computing applications. The OMA includes the following components:

A global object model to define how the heterogeneous resources that make up the system can be modeled as objects. In the OMA, this global object model is defined by the CORBA Interface Definition Language (IDL).
The Object Request Broker (ORB), an object messaging backplane that enables distributed objects to transparently send and received requests and responses.
Object Services, which support basic functions for using and implementing objects, and are likely to be used in any object-based program. Examples include support for queries, transactions, and event notification.
Common Facilities, which provide end-user oriented capabilities useful across multiple application domains, such as compound document and workflow facilities.
Domain Objects, which are likely to be used only in specific vertical application domains, such as telecommunications or manufacturing.
Application Objects, which are built specifically for a particular application.

These components provide multiple levels of capabilities in support of developing complex distributed applications.

The ORB in the OMA is defined by the CORBA specifications. An ORB does not require that the objects it supports be implemented in an object-oriented programming language. The CORBA architecture defines interfaces for connecting code and data to form object implementations, with interfaces defined by IDL, that are managed by the ORB and its supporting object services. It is this flexibility that enables ORBs to be used in connecting legacy systems and data together as components in enterprise computing architectures.

A distributed enterprise object system must provide functionality beyond that of simply delivering messages between objects. OMG's Object Services have been defined to address some of these requirements. Object Services provide the next level of structure above the basic object messaging support provided by CORBA. The services define specific types of objects (or interfaces) and relationships between them in order to support higher-level capabilities. Object Services currently defined by OMG include, among others:

Concurrency Control Service
Life Cycle Services
Event Notification Service
Query Service
Persistent Object Service
Relationship Service
Naming Service
Transaction Service

Taken together, OMG Object Services provide services for ORB-accessible objects similar to those that an Object DBMS (ODBMS) provides for objects in an object database (queries, transactions, etc.). The Object Services, together with the basic connectivity provided by the ORB, turn the collection of network-accessible objects into a unified shared object space, accessible by any ORB client application. Managing the collection of ORB-accessible objects thus becomes a generalized form of "object database management", with the ORB being part of the internal implementation of what is effectively an ODBMS. Viewed in this way, the OMA provides a powerful object-oriented infrastructure for the development of general-purpose applications, just as an enterprise database and its associated DBMS provide such an infrastructure for the development of general-purpose enterprise applications. Additional levels of organization are also needed. These additional levels are where OMG's Common Facilities, Application, and Domain Objects, as well as still higher level concepts, come into play [MGHH+97].

If the Web is to be used as the basis of complex enterprise applications, it must provide generic capabilities similar to those provided by the OMA, although these may need to be adapted to the more open, flexible nature of the Web. Providing these capabilities involves addressing not only the provision of higher level services and facilities for the Web, but also the suitability of the basic data structuring capabilities provided by the Web (its "object model"). For example, in the case of services, search engines (a form of query service) are becoming indispensable tools, and agent technology can add additional intelligence to the searching process. Similarly, extended facilities to support transactions over the Web are being investigated. However, the ability to define and apply powerful generic services in the Web, and the ability to generally use the Web to support complex applications, depends crucially on the ability of the Web's underlying data structure to support these complex applications and services.

1.3 Increasing the Structuring Power of the Web

The basic data structure of the Web consists of hyperlinked HTML documents. It is generally recognized that HTML is too simple a data structure to support complex enterprise applications. For example, Jon Bosak's XML, Java, and the Future of the Web [Bos97] identifies a number of key limitations of HTML:

Extensibility: HTML does not allow users to specify their own tags or attributes in order to help identify the semantic significance of data (e.g., to identify that a particular text string represents the title of a document, or the customer placing an order).
Structure: HTML does not support the specification of deep structures needed to represent, e.g., database schemas or object-oriented hierarchies.
Validation: HTML does not support the kind of language specification that allows client applications to check data for structural validity on loading the data, e.g., data that represents fixed structured forms or database records

These limitations severely affect the ability to develop advanced applications using HTML, including:

applications that require the Web client to function as the front-end to enterprise applications or mediate between multiple heterogeneous databases,
applications that require more flexibility in distributing processing load between Web servers and clients, and
applications that require the Web client to present different views of the same data to different users, or in which intelligent Web agents need to tailor information discovery to the needs of individual users.

Proprietary HTML extensions have been developed to address some of these problems, but none deals with all of them, and together they create barriers to interoperability. The same is true of the proprietary data formats used by particular applications. Their use requires specialized helper applications, plug-ins, or Java applets, creating interoperability problems, and difficulty in reusing that data in different applications for new purposes. While use of some specialized formats is necessary in particular applications (e.g., multimedia), in many cases these formats are just used to address the deficiencies of HTML for generalized document and data processing.

A more fundamental direction of efforts to address HTML limitations has been attempts to integrate aspects of object technology with the basic infrastructure of the Web. There are a number of reasons for the interest in integrating Web and object technologies:

The Web, even in its current form, can be viewed as a simple form of distributed object system, with a particularly simple object model. In this model, HTML pages are considered as objects (actually, object state), having identity provided by URLs, and methods defined by, or that are invoked via, HTTP servers. The methods supported by HTTP servers are extensible, and HTTP supports negotiation to find out what they are (even though GET, PUT, and POST are the only methods generally used). The basic resemblance of the Web to a simple object system has created a natural interest in seeing how far the resemblance can be further developed. The World Wide Web Consortium (W3C) HTTP-NG activity <http://www.w3.org/Protocols/HTTP-NG/> is attempting to do this at the protocol level by developing a new architecture for the HTTP protocol based on a simple, extensible, distributed object-oriented model.
Object technology is seen as a particularly-convenient way of adding functionality (e.g., behavior) to the Web, both by adding the behavior provided by objects to the static content of HTML, and by allowing Web clients and servers, through distributed object technology, to access other computing resources. For example:

Web pages can be used as convenient carriers or containers for objects in various models, e.g., Java or ActiveX objects. In this approach, objects are added to the conventional static content of Web pages. The pages provide a vehicle for transmitting the objects between server and client. Once on the client, the objects can then execute. In some cases, the client objects then interact with server objects, possibly using a different protocol, e.g., OMG's IIOP or Java's RMI. While this was originally supported by proprietary extensions, HTML specifications now include support for the <APPLET> tag, and the recently-adopted HTML 4.0 specification includes a more general <OBJECT> tag (see Section 2.3).
Web pages can be treated as objects with methods that execute on HTTP clients. Dynamic HTML developments by Microsoft <http://www.microsoft.com/sitebuilder/workshop/author/dhtml/> and Netscape are examples of this approach. Current work by the W3C on a Document Object Model <http://www.w3.org/TR/WD-DOM/> is attempting to extend these ideas to include even more powerful facilities (see Section 2.3). What is being proposed is an object model that allows the HTML document, together with its contents (its collection of elements and attributes), to be treated as a collection of programmable objects. Client-side code (scripts or code contained in the document, or plug-ins or other code which accesses the document through the client) will be allowed to access these objects, and manipulate them dynamically (e.g., causing immediate changes in the document displayed to the user).

Such efforts all contribute toward giving the Web a richer structural base, capable of directly supporting a wider variety of activities, in more flexible and extensible ways. However, up until recently these efforts have still been based on HTML, with its basic structuring limitations, and have generally been pursued as separate, non-integrated activities. There is much other ongoing work within both the Web and database communities on data structure developments to address Web-related enhancements. Work on similar issues is ongoing within the Object Management Group as well (see Section 2.4). This work has contributed valuable ideas, and the various proposals illustrate similar basic concepts, generally, movement toward some form of simple object model. However, these similarities are often obscured by detailed representational differences, and the work is fragmented and lacks a unifying framework. As a result, individual proposals often lack key capabilities that are in some cases contained in other proposals. Moreover, in many cases these proposals are not well-integrated with key areas of emerging industry consensus on Web data structuring technologies.

If the Internet is to develop to support advanced application requirements, there is a need for both richer individual data structuring mechanisms, and a unifying overall framework which supports heterogeneous representations and extensibility, and provides metalevel concepts for describing and integrating them.

The intent of this paper is to describe how a number of (in some respects) separate "threads" of Web-related development can be combined to form the basis of a Web object model to address these requirements. This combination is based on the observation that the fundamental components of any object model are:

data structures that can represent object state
ways to associate behavior (object methods) with the object state
ways for the object methods to access and operate on that state

As a result, what is needed to progress toward a Web object model is:

a richer base representation than HTML, in order to better represent "object state" (in particular, better support for semantic identification of fields, rather than simply supporting presentation aspects of data)
an API to this state, so that programs can readily access it (without complex parsing)
an enhanced ability to define relationships between this state and specified pieces of code that can serve as object methods

At the same time, the openness of the Web compared to conventional object models needs to be preserved, due to the distinct requirements of the Web environment for openness and scalability.

In the following sections, this paper will:

describe key examples of existing work from the Web, database, and OMG communities that contribute both ideas and technology toward providing the components of a Web object model identified above
identify some key underlying principles behind this work
identify a framework which allows this work to be unified and extended to support the requirements of advanced Web applications for object technology

2. Relevant Work

As noted in the Introduction, there has been much ongoing work on enhancements to address Web limitations in supporting richer data structures, and integrating object technology. For example, the Internet and Web communities have developed both additional representations, and a number of "object models" or data structuring principles, to represent richer data structures. The database community has also developed proposals for "lightweight object models," partly driven by attempts to represent the structure of Web resources. All this work has contributed valuable ideas and, taken as a whole, exhibits important common underlying principles. What is required is that this work be integrated, and the best ideas merged.

The Introduction specifically noted that what is needed to progress toward a Web object model is:

a richer base representation than HTML, in order to better represent "object state" (in particular, better support for semantic identification of fields, rather than simply supporting presentation aspects of data)
an API to this state, so that programs can readily access it (without complex parsing)
an enhanced ability to define relationships between this state and specified pieces of code that can serve as object methods

This section describes a number of the key technologies that attempt to address parts of these problems. Several of these technologies will be used as the basis of an approach, described in Section 3, which integrates them to support a Web object model.

Caveats

The following subsections describing the various technologies are in some cases rather long, and include a great deal of text and specific examples taken from the cited references. The purpose in doing this is to provide enough detail in one place to illustrate key concepts and the roles they might play in supporting a Web object model, and to give the reader a feel for how generalizations of the concepts might be developed. Hence, this report makes no claims of originality for most of this material (and readers should refer to the cited sources for further details). The subsections also include some additional commentary highlighting key points, and establishing "forward references" to later material.

Several of the sections describe ongoing activities of the World Wide Web Consortium (W3C), particularly:

the Extensible Markup Language (XML)
the Resource Description Framework (RDF)
the Document Object Model (DOM)

The reader should be aware that in many cases these specifications are works in progress. As a result, some of the details described in this report, as well as the source references, may no longer be completely accurate (or accessible due to changed URLs) by the time the report is read. The latest information on these activities can be obtained through the main W3C Web page <http://www.w3.org/> or W3C's technical report page <http://www.w3.org/TR/>.

2.1 Structured Data Representations and "Lightweight Object Models"

The Introduction briefly described HTML's limitations in supporting the data structure requirements of more complex Web applications. HTML was adequate as long as what applications were generally doing was simply displaying pages to users. However, more complex applications require programs to be able to recognize and process parts of Web pages that have specific semantic meaning within the application. In some cases, applications require data that has a well-defined, fixed format (such as an invoice or other form). Even if applications don't require such fully regular structures, they often need the ability to identify specific pieces of a page's contents. For example, a document may not have a fixed number of authors, but it is still important to be able to identify the strings of text that correspond to authors' names. In some cases, these "pieces" would correspond to specific fields in records, such as "author". In other cases, they would correspond to specific relationships (e.g., a "citation" link to a related paper).

These are the same structuring requirements that apply to object state in object models; i.e., an object's state must be structured in such a way that the object methods can find the parts of the state that they need in order to execute properly. As compared with HTML, whose tags are primarily concerned with how the tagged information is to be presented, satisfying this structuring requirement involves some form of semantic markup, i.e., the ability to tag items with names that can be used to identify items based (at least to some extent) on their semantics.

This section describes a number of developments directed at dealing with the problems of providing richer data structuring capabilities for Web data.

2.1.1 Summary Object Exchange Format (SOIF)

Harvest's Summary Object Interchange Format (SOIF) is a syntax for representing and transmitting descriptions of (metadata about) Internet resources such as files, sites, Web pages, etc., as well as other kinds of structured objects (see Internet draft: CIP Index Object Format for SOIF Objects <http://www.globecom.net/(eng)/ietf/draft/draft-ietf-find-cip-soif-01.shtml>). SOIF is based on a combination of the Internet Anonymous FTP Archives (IAFA) IETF Working Group templates and BibTeX. Each resource description is represented in SOIF as a list of attribute-value pairs (e.g., Company = 'Netscape'). SOIF handles both textual and binary data as values, and, with some minor extensions, multivalued attributes. SOIF also allows bulk transfer of many resource descriptions in a single, efficient stream. A SOIF stream contains one or more SOIF objects, each of which contains the structured content of a resource description. An example SOIF object might be:

@DOCUMENT { http://www.netscape.com:80/
Title{20}: Welcome to Netscape!
Last-Modified{29}: Thu, 16 May 1996 11:45:39 GMT }

Resource Description Messages (RDM) <http://www.w3.org/TR/NOTE-rdm>, 24 July 1996, by Darren Hardy (Netscape), is a technical specification of Resource Description Messages (RDM). RDM is used in Netscape's Catalog Server. RDM is a mechanism to discover and retrieve metadata about network-accessible resources, known as Resource Descriptions (RDs). A Resource Description consists of a list of attribute-value pairs (e.g., Author = Darren Hardy, Title = RDM) and is associated with a resource via a URL. Agents can generate RDs automatically (e.g., a WWW robot), or people can write RDs manually (e.g., a librarian or author). Once a repository of Resource Descriptions is assembled, the server can export it via RDM as a programmatic way for WWW agents to discover and retrieve the RDs.

RDM uses Harvest's SOIF format to encode the RDs. The data model that SOIF provides is a flat name space for the attributes, and treats all values as blobs. The RDM schema definition language extends the SOIF data model by providing:

Data type and format information for the values (e.g., varchar and application/rfc822-address, or blob and text/html).
Hints to the RDM client as to which attributes should be surfaced to the user-level, and attributes which are included in the default view.
Hints to an indexer as to which attributes should be indexed, and attributes which should be used to suppress duplicates.
A mapping between attribute names and (table name, column name) tuples, which helps an RDM client to place this data into the relational data model to support RDBMS backends.
Other semantic information, such as indexable columns and foreign keys, which helps in mapping the SOIF objects into the relational data model.

SOIF illustrates a theme that will be repeated in other Web-related structured data representations discussed here: the representation of data as semantically tagged data items (attribute/value pairs), where the tags or attribute names convey something of the meaning of the associated data value. A key advantage of an approach based on individual attribute/value pairs is that, unlike a database-like "typed record" approach, it is arbitrarily extensible in a federated environment like the Web (without a centralized collection of types or schema). Anyone can record any attributes they feel are necessary, without going through the "overhead" of defining a new type (and, in particular, possibly having to define it as a subtype of an existing type), and distributing that type definition throughout a distributed network.

However, while SOIF supports attribute/value pairs, its structuring capabilities are not sufficiently rich to support the full structuring requirements of the Web. For example, it lacks support for nested structures, and cannot support the functionality of HTML, let alone extensions to it. It is also not well integrated with more advanced developments in Web data representation, such as XML, RDF, and DOM, described later.

2.1.2 Object Exchange Model (OEM)

Stanford's Object Exchange Model (OEM) [PGW95, AQMW+96] is a "lightweight object model" developed to act as a general model capable of representing both database and Web data structures. A similar model, developed at the University of Pennsylvania, is described in [BDHS96, BDFS97]. OEM was introduced in TSIMMIS (The Stanford-IBM Manager of Multiple Information Sources) as a self-describing way of representing metadata. OEM was later modified for use in the Lore (Lightweight Object Repository) system. OEM exists in two main variants. In the original (TSIMMIS) version, OEM defines a set of labeled nodes. Each node has an object identifier (oid), a label, a type, and a value (the type defines the type of the value). The types include primitive types such as integer, and set. If the type is set, the value consists of a set of oids of other nodes. This allows aggregate structures to be defined. These structures are shown in the figure below.

original (TSIMMIS) OEM:

+-----+-------+------+-------+
| oid | label | type | value | type includes "set"
+-----+-------+------+-------+

+-----+-------+------+-----------------+
| oid | label | set  | {oid, oid, ...} |
+-----+-------+------+-----------------+

In the newer (Lore) version of OEM, the structures have been modified so that edges are labeled rather than nodes. In this scheme, a complex object consists of a set of (label,oid) pairs. These effectively represent relationships between the containing object and the target object. That is, a given (label,targetoid) pair contained in object sourceobject represents the relationship

label(sourceobject, targetobject)

This revised structure thus more closely resembles a first order logic (FOL) structuring of data. These structures are shown in the figure below.

new (Lorel) OEM:

atomic object
+-----+------+-------+
| oid | type | value |
+-----+------+-------+

complex object
+-----+---------+-------------------------------------------+
| oid | complex | value = {(label, oid), (label, oid), ...} |
+-----+---------+-------------------------------------------+

Since individual objects do not have labels in this scheme, additional labels are introduced so that top-level objects can also have names.

As an example, a simple structure for information on books in a library might have the following structure in the TSIMMIS OEM:

+----+---------+------+---------------+
| &1 | library | set  | {&2, &5, ...} |
+----+---------+------+---------------+
+----+------+------+----------+
| &2 | book | set  | {&3, &4} |
+----+------+------+----------+
+----+--------+--------+-----+
| &3 | author | string | Aho |
+----+--------+--------+-----+
+----+-------+---------+-----------+
| &4 | title | string  | Compilers |
+----+-------+---------+-----------+

Linearly, this might be represented as:

<&1, library, set, {&2,&5,...} >
<&2, book, set, {&3,&4} >
<&3, author, string, Aho >
<&4, title, string, Compilers >

In the Lorel OEM, the same structure would be:

         +----+------+-----------------------------+
library: | &1 | set  | {(book,&2), (book,&5), ...} |
         +----+------+-----------------------------+
         +----+------+---------------------------+
         | &2 | set  | {(author,&3), (title,&4)} |
         +----+------+---------------------------+
         +----+--------+-----+
         | &3 | string | Aho |
         +----+--------+-----+
         +----+---------+-----------+
         | &4 | string  | Compilers |
         +----+---------+-----------+

OEM can represent complex graph structures, similar to those that exist in the Web. It is a "lightweight" object model in the sense that:

it does not require the definition of classes or types; arbitrary structures with arbitrary attribute names can be included in OEM structures; this enables it to more directly represent the irregular structures found within and among Web resources
it does not support encapsulation; applications can directly access the OEM structures
it does not support object behavior (there are no object methods defined for OEM nodes)

OEM and related models effectively define global models for a federated database system, where the federated components include unstructured or semistructured data sources such as the Web (unlike the more conventional structured database sources usually considered in federated database systems). These models provide a valuable core of ideas for applying database concepts to Web data. As the examples illustrate, OEM is based on the use of attribute/value pairs. This is important in allowing the individual components of Web resources to be recognized and accessed in a meaningful way by applications. In addition, OEM extends the basic attribute/value pair model by providing each pair with its own identifier. This is important in allowing complex nested and graph structures to be defined. It is also potentially important in allowing additional descriptive information (metadata) to be directly associated with the pairs (e.g., to describe an attribute's meaning more fully). However, this latter idea has not directly followed up in the OEM-related papers reviewed.

While these models are intended to represent data in (or extracted from) Web and other resources, and hence constitute a form of metadata, the capabilities of these models for representing metadata that might already exist about a resource, and for representing their own metadata, are somewhat undeveloped. They do not explicitly consider capturing type and schema information where it exists, or linking that type information to the structures it describes. For example, when OEM is used to capture a database structure, a schema actually exists for this data, unlike Web resources. It should be possible to capture both the data and the schema in OEM, and link them together. This is not really followed up in existing OEM work (although it could be). Related work has been done on a concept called DataGuides [GW97, NUWC97]. A DataGuide resembles a schema, but is derived dynamically as a summary of the structures that have been encountered, and only approximately describes the structures that may actually be encountered. This is appropriate for unstructured and semistructured data, but does not fully represent the semantics of an actual schema.

These models as currently implemented are also not well integrated with emerging Web technologies, such as the XML, DOM, and RDF work described below, that are likely to change the basic nature of the Web's representation. The approach taken in work such as OEM has so far assumed that the Web will continue to be largely unstructured or semistructured, based on HTML, and that data from the Web will need to be extracted into separate OEM structures (or interpreted as if it had been) in order perform database-like manipulations on it. On the other hand, the new Web technologies provide a higher level, more semantic representational structure, which can start with the assumption that information authors themselves have support to provide more semantic structural information. Our work on a Web object model is based on the idea that, with this additional representation support, it makes sense to investigate building more database-like capabilities within the Web infrastructure itself, rather than assuming that almost all of these database capabilities need to be added externally. Since Web structures are unlikely to become as regular as conventional databases, some of the principles developed by work such as OEM will continue to be important (and, in fact, as a model, OEM has many similarities with work such as RDF described later in this report). However, it seems likely that these principles will need to be applied in the context of representations such as XML and DOM, used directly as the basis of an enhanced Web infrastructure.

2.1.3 Knowledge Interchange Format (KIF)

The Knowledge Interchange Format <http://logic.stanford.edu/kif/kif.html> provides a common means of exchanging knowledge between programs with differing internal knowledge representation techniques. It is human-readable, with declarative semantics. It can express first-order logic (FOL) sentences, with some second-order capabilities. Translators exist to convert various knowledge representation languages to and from KIF. A simple example of KIF in representing information about an ontology (from [BBBC+97]) is:

ontology(o_857)
ontology_name(o_857,'healthcare')
ontology_frame(o_857,f_123)
frame(f_123)
frame_name(f_123,'encounter_drg')
slot(s_345)
frame_slot(f_123,s_345)
slot_name(s_345,'patient_age')
constraint(c_674)
slot_constraint(s_345,c_674)
constraint_expression(c_674,[[gt,'patient_age',43]
  [lt,'patient_age',75]]]

The example illustrates that the KIF representation of data is based on the use of attribute/value pairs; in fact, this is a direct representation of the way this information might be expressed in first-order logic. This also illustrates the fact that a FOL representation necessarily introduces a number of "intermediate" object identifiers (like o_857 and f_123), in order to assert the identity of distinct concepts, and to represent relationships among the various parts of the description. This is similar to the way that OEM introduces identifiers for the individual parts of a resource description. The KIF example particularly illustrates the use of such identifiers in defining namespaces like frames or ontologies, which qualify contained information.

Like OEM, KIF is capable of representing arbitrary graph structures. Moreover, KIF illustrates the importance of identifying parts of a data structure representation with logical assertions in conveying semantics between applications. Section 3 will describe how this principle serves the basis of a formal Web object model definition. However, while KIF is widely used for knowledge interchange, it, like OEM, is not well integrated with emerging Web infrastructure technologies.

2.1.4 Extensible Markup Language (XML)

The Extensible Markup Language (XML) <http://www.w3.org/XML/>, is an ongoing effort within the World Wide Web Consortium (W3C). XML is a data format for structured document interchange on the Web. More specifically, XML defines a simple subset of SGML (the Standard Generalized Markup Language [ISO86]; see also, e.g., [DeR97]), and is intended to make it easy to use SGML on the Web. XML is extensible because unlike HTML, which defines a fixed set of tags, XML allows the definition of customized markup languages with application-specific tags, e.g., <AUTHOR> or <QTY-ON-HAND>, for exchanging information in particular application domains such as chemistry, electronics, or general business. Hence, XML is really a metalanguage (a language for describing languages).

Because authors and providers can design their own document types using XML, browsers can benefit from improved facilities, and applications can use tailored markup to process data. As a result, XML provides direct support for using application-specific tagged data items (attribute/value pairs) in Web resources, as opposed to the current need to use ad hoc encodings of data items in terms of HTML tags. [KR97] provides a useful overview of the potential benefits of using XML in Web-related applications.

Although XML could eventually completely replace HTML, XML and HTML are expected to coexist for some time. In some cases, applications may wish to define entirely separate XML documents for their own processing, and convert the XML to HTML for display purposes. Alternatively, applications may wish to continue using HTML pages as their primary document format, embedding XML within the HTML for application-specific purposes. For example, [Hop97] describes the use of blocks of XML markup enclosed by <XML> and </XML> tags within an HTML document for this purpose.

XML has considerable industry support, e.g., from Netscape, Microsoft, and Sun. For example, Microsoft has built an XML parser into Internet Explorer 4.0 (which uses XML for several applications), has made available XML parsers in Java and C++, together with links to other XML tools (see http://www.microsoft.com/xml/), and has indicated that it will use XML in future versions of Microsoft Office products. Microsoft has also contributed to a number of proposals to W3C on the use of XML as a base for various purposes (some of which will be discussed in later sections). Netscape has said it will support XML via the Meta Content Framework (described in Section 2.2) in a future version of its Communicator product. Work is also underway on tying XML to Java in a number of ways. Other commercial vendors are also developing XML-related software tools. In addition, a number of XML tools are available for free non-commercial use. A list of some of these tools is available at the W3C XML Web page identified above.

A number of industry groups have defined SGML Document Type Definitions (DTDs) for their documents (e.g., the U.S. Defense Department, which requires much of its documentation to be submitted according to defined SGML DTDs); in many cases these could either be used with XML directly, or converted in a straightforward fashion. Work is already underway to define XML-based data exchange formats in both the chemical and healthcare communities. Work has also been done on other applications of XML, e.g., an Ontology Markup Language (OML) <http://wave.eecs.wsu.edu/WAVE/Ontologies/OML/OML-DTD.html> for representing ontologies in XML.

The W3C XML specification has several parts:

XML (language): specifications for XML documents and Document Type Definitions (DTDs) <http://www.w3.org/TR/REC-xml>; these specifications have the status of a W3C Recommendation, and hence are stable
XLL (XML-Link): draft specifications of constructs that can be inserted in XML documents to describe links between objects and addressing into the internal structures of XML documents <http://www.w3.org/TR/WD-xml-link>
XSL (XML-Style): a submission defining presentation styles for XML documents <http://www.w3.org/TR/NOTE-XSL>

A DTD is usually a file (or several files together) which contains a formal definition of a particular type of document. This acts like a database schema, and defines what names can be used for elements, where they may occur (e.g., <ITEM> might only be meaningful inside <LIST>), and how they all fit together. The DTD lets processors parse a document and identify where each elements belongs, so that stylesheets, browsers, search engines, and other applications can be used. The linking of resources with the DTDs that describe them is similar to the association of a database record with its schema type, and to the association of an object with its type or class definition.

An XML document may be either valid or well-formed. A valid XML document is well-formed, and has a DTD. The document begins with a declaration of its DTD. This may include a pointer to an external document (a local file or the URL of a DTD that can be retrieved over the network) that contains a subset of the required markup declarations (called the external subset), and may also include an internal subset of markup declarations contained directly within the document. The external and internal subsets, taken together, constitute the complete DTD of the document. The DTD effectively defines a grammar which defines a class of documents. Normally, the bulk of the markup declarations appear in the external subset, which is referred to by all documents of the same class. If both external and internal subsets are used, the XML processor must read the internal subset first, then the external subset. This allows the entity and attribute declarations in the internal subset to take precedence over those in the external subset (thus allowing local variants in documents of the same class). XML DTDs can also be composed, so that new document types can be created from existing ones.

A well-formed XML document can be used without a DTD, but must follow a number of simple rules to ensure that it can be parsed correctly. These rules require, among other things, that:

all tags must be balanced (elements must have both start and end tags present)
all attribute values must be in quotes
elements must nest inside each other properly (no overlapping markup)

The general characteristics of XML can be illustrated using an example of a document that maintains a list of people's electronic business cards (this example is modified from one in [KR97], and is not necessarily consistent with the details of the latest XML specification). Each business card contains the person's first name, last name, company, email address, and Web page address. There is more than one way to represent attribute-value style data in XML. One approach is to specify the attributes as the "attributes" of XML tags. In this case, the document contains only tags annotated with attribute-value pairs, and there is no content in the document other than the tags themselves (which can be parsed and processed by applications). Using this approach, an example document would be:

<!DOCTYPE bCard "http://www.objs.com/schemas/bCard">
<bCard>

<?xml default bCard
      firstname = ""
      lastname  = ""
      company   = ""
      email     = ""
      webpage   = ""
?>

<bCard
      firstname = "Frank"
      lastname  = "Manola"
      company  =  "Object Services and Consulting"
      email     = "fmanola@objs.com"
      webpage  =  "http://www.objs.com/manola.htm"
/>

<bCard
      firstname = "Craig"
      lastname  = "Thompson"
      company  =  "Object Services and Consulting"
      email     = "thompson@objs.com"
      webpage  =  "http://www.objs.com/thompson.htm"
/>
</bCard>

The default specification ensures that every tag has the same number of attribute-value pairs.

An alternative representation uses different tags, rather than XML attributes, to identify the meaning of the content. Using this approach, the same content would be represented as:

<bCard>
<FIRSTNAME>Frank</FIRSTNAME>
<LASTNAME>Manola</LASTNAME>
<COMPANY>Object Services and Consulting</COMPANY>
<EMAIL>fmanola@objs.com</EMAIL>
<WEBPAGE>http://www.objs.com/manola.htm</WEBPAGE>
</bCard>

<bCard>
<FIRSTNAME> Craig </FIRSTNAME>
<LASTNAME> Thompson </LASTNAME>
<COMPANY>Object Services and Consulting</COMPANY>
<EMAIL> thompson@objs.com </EMAIL>
<WEBPAGE>http://www.objs.com/thompson.htm</WEBPAGE>
</bCard>

The paper XML representation of a relational database <http://www.w3.org/XML/RDB.html> uses a relational database as a simple example of how to represent more complex structured information in XML. A relational database consists of a set of tables, where each table is a set of records. A record in turn is a set of fields and each field is a pair field-name/field-value. All records in a particular table have the same number of fields with the same field-names. This description suggests that a database could be represented as a hierarchy of depth four: the database consists of a set of tables, which in turn consist of rows, which in turn consist of fields. The following example, taken from the cited paper, describes a possible XML representation of a single database with two tables:

<!doctype mydata "http://www.w3.org/mydata">
<mydata>

<authors>
<author>
<name>Robert Roberts</name>
<address>10 Tenth St, Decapolis</address>
<editor>Ella Ellis</editor>
<ms type="blob">ftp://docs/rr-10</ms>
<born>1960/05/26</born>
</author>

<author>
<name>Tom Thomas</name>
<address>2 Second Av, Duo-Duo</address>
<editor>Ella Ellis</editor>
<ms type="blob">ftp://docs/tt-2</ms>
</author>

<author>
<name>Mark Marks</name>
<address>1 Premier, Maintown</address>
<editor>Ella Ellis</editor>
<ms type="blob">ftp://docs/mm-1</ms>
</author>
</authors>

<editors>
<editor>
<name>Ella Ellis</name>
<telephone>7356</telephone>
</editor>
</editors>

</mydata>

The representation is human-readable, but fairly verbose (since XML in general is verbose). However, it compresses well with standard compression tools. It is also easy to print the database (or a part of it) with standard XML browsers and a simple style sheet.

The database is modeled with an XML document node and its associated element node:

<!doctype name "url">
<name>
table1
table 2
...
table n
</name>

The name is arbitrary. The url is optional, but can be used to point to information about the database. The order of the tables is also arbitrary, since a relational database defines no ordering on them. Each table of the database is represented by an XML element node with the records as its children:

<name>
record1
record2
...
recordm
</name>

The name is the name of the table. The order of the records is arbitrary, since the relational data model defines no ordering on them. A row is also represented by an element node, with its fields as children:

<name>
field1
field2
...
fieldm
</name>

The name is the name of the row type (this was not required in the original relational model, but the current specification allows definition of row types); the name is required in XML anyway. The order of the fields is arbitrary. A field is represented as an element node with a data node as its only child:

<name type="t">
d
</name

If d is omitted, it means the value of the fields is the empty string. The value of t indicates the type of the value (such as string, number, boolean, date). If the type attribute is omitted, the type can be assumed to be `string.'

This example illustrates that XML tags can (and will) represent concepts at multiple levels of abstraction. The example defines a specific four-level hierarchy, but does not explicitly define the relational model and indicate the hierarchical relationships among the various relational constructs. In order to do this in a generic way for all relational databases, there would need to be explicit tags such as <SCHEMA>, <TABLE>, <ROW>, etc., and a specification of how they should be nested. This is metalevel information as far as the XML representation is concerned, and could be specified in the DTD. The definition of models, such as the relational model, for organizing data for specific purposes, is independent of XML, and needs to be done separately. The definition of such models (in some cases using XML as their representation) is discussed in the next section.

An XML document consists of text, and is basically a linearization of a tree structure. At every node in the tree there are several character strings. The tree structure and character strings together form the information content of an XML document. Some of the character strings serve to define the tree structure; others are there to define content. In addition to the basic tree structure, there are mechanisms to define connections between arbitrary nodes in the tree. For example, in the following document there is a root node with three children, with one of the children containing a link to one of the other children:

<p>
<q id="x7">The first child of type q</q>
<q id="x8">The second child of type q</q>
<q href="#x7">The third child of type q</q>
</p>

In this case, the third child contains an href attribute which points to the first child, using its id attribute as an identifier.

The XML linking model is described in the XLL draft <http://www.w3.org/TR/WD-xml-link>. The full hypertext linking capabilities of XML are much more powerful than those of HTML, and are based on more powerful hypertext technology such as described in HyTime [ISO92] <http://www.hytime.org/> and the Text Encoding Initiative (TEI) <http://www.uic.edu/orgs/tei/>. The current specification supports both conventional URLs, and TEI extended pointers. The latter provide support for bidirectional and multi-way links, as well as links to a span of text (i.e., a subset of the document) within the same or other documents.

XSL <http://www.w3.org/TR/NOTE-XSL> is a submission defining stylesheet capabilities for XML documents. XML stylesheets enable formatting information to be associated with elements in a source document to produce formatted output. XML stylesheet capabilities are based on a subset of those defined in the ISO standard Document Style Semantics and Specification Language (DSSSL) [ISO96] used in formatting SGML documents. The formatted output is created by formatting a tree of flow objects. A flow object has a class, which represents a kind of formatting task, together with a set of named characteristics, which further specify the formatting. The association of elements in the source document tree to flow objects is defined using construction rules. A construction rule contains a pattern to identify specific elements in the source tree, and an action to specify a resulting subtree of flow objects. The stylesheet processor recursively processes source elements to produce a complete flow object tree which defines how the document is to be presented.

The XML working group is also currently developing a Namespace facility <http://www.w3.org/TR/1998/NOTE-xml-names> that will allow Generic Identifiers (tag names) to have a prefix which will make them unique and will prevent name clashes when developing documents that mix elements from different schemas. This facility allows a document's prolog to contain a set of Processing Instructions (an SGML concept) of the form:

<?xml:namespace name="some-uri" as="some-abbreviation"?>

for example

<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<?xml:namespace name="http://www.purl.org/DublinCore/schema" as="DC"?>

Elements in the document may then use generic identifiers of the form <RDF:assertions> or <DC:Title>. Those element names would expand to URIs such as http://www.w3.org/schemas/rdf-schema#assertions. This work is still under development, and the details of the final specification may differ from those described here.

XML provides basic tagged value support, as well as support for nesting, and enhanced link capabilities. Because the Web community is increasingly targeting XML as its "next generation Web representation", the Web object model described in Section 3 uses XML as its basic representation of object state. However, additional concepts must also be defined to apply XML to extended data and metadata structuring requirements, and particularly the requirements for a Web object model that go beyond a richer state representation. Some of these requirements are illustrated both by the relational database example above, and by the RDF and related efforts described in the next section. These efforts generally involve defining data model concepts for representing specific kinds of data (as the relational model does for database data), and then using the tagged value structures supported by XML as their representation. These models support various ways of using identifier concepts (URLs plus other identifier concepts) to provide support for graph structured data. An additional general requirement, not generally addressed by Web-related activities, is the definition of structured database capabilities (e.g., an algebra or calculus to serve as the basis for database-like query and view facilities for XML data).

2.2 Higher-Level Models and Metadata

Richer representation techniques for Web information, such as XML, are an important component in making the Web an improved basis for enhanced applications of all kinds. However, additional structure must also be defined. For example, XML provides support for the representation of data in terms of attribute/value pairs, with user-defined tags. However, this alone will not provide for easy interchange of information, and interoperability among components since, using XML, different users could define their own ways of using attribute-value pairs to represent the same (or the same type of) information. Thus, there is also a need to define additional characteristics of what to represent using representations such as XML.

A data model defines one level of "what to represent". For example, the relational data model defines structuring concepts such as rows and tables, and provides one basic organizational framework for representing data. The example from the previous section of how to represent relational data in XML illustrated how using the relational model imposed additional structure on the XML representation. Defining a data model for data represented in XML both suggests specific structuring concepts for using XML to organize data, and may also involve the specification of certain standard tags or attributes (like <TABLE>) to reflect those concepts. Use of particular data models (represented using techniques such as XML) regularizes the structures that may be encountered, and potentially simplifies the task of applications that process those structures.

An additional level of "what to represent" is provided by standardizing the use of domain-specific attribute/value pairs and document structures (e.g., standards for specific kinds of reports or forms). SGML and XML DTDs constitute one way to specify such standards, and there are already numerous SGML DTDs in use for this purpose (these could, in most cases, be easily adapted for use with XML).

An important source of efforts to develop such higher-level model specifications for use on the Web has been work on developing representation techniques for Web metadata, i.e., data available via the Web that describes or helps interpret either other Web resources or non-Web resources. This metadata is used both to facilitate Web searches for relevant data, and to either provide direct access to it (if it is Web-accessible) or at least indicate its existence and possibly describe how to obtain it. The reason why the development of metadata representations has driven the development of higher-level models is that the metadata is intended to support indexing, searching, and other automated processes that require more structure than may be present in the data itself. Metadata requirements have also driven the development of structured representations themselves. For example, the SOIF format described in Section 2.1.1 was developed to represent Web metadata.

Efforts to develop enhanced metadata capabilities have involved several types of activity (a given effort may bundle more than one of them):

The definition of an abstract metadata model, i.e., the definition of the basic constructs and operations of the model, and their semantics, or the definition of the principles behind such models. Specific models may add additional mechanisms, such as predefined attributes and types, and inheritance. Among the work described later in this section, the Dublin Core and Warwick Framework are examples of work on the basic principles of metadata models. The Resource Description Framework (RDF) and Meta Content Framework (MCF) are examples of specific metadata models.
The definition of one or more representations of these models in terms of specific syntactic formats such as HTML, or XML (equivalently in some cases, the definitions of how various popular representations, such as HTML pages, are to be viewed in these models). Examples of such definitions are described in subsequent sections, e.g., the representation of RDF in XML.
The definition of requirements, and specific sets of attributes and their associated value types, for defining specific types of metadata for specific application areas. The Dublin Core is an example of work on metadata intended to be descriptive of resources of all types. Other examples of metadata definitions for specific types of resources (or data which could be used as such metadata) include:

Federal metadata standards for geospatial data <http://www.fgdc.gov/Metadata/Metadata.html>
work on metadata to support searching for software resources [IK96]
the PICS specifications for describing ratings information (see Section 2.2.3)

Web data/metadata models defined "on top of" representations such as XML are relevant to the development of a Web object model in helping to further define an adequate basis for representing object state. In addition, these models are also relevant in helping to identify ways to establish relationships between the object state and the specified pieces of code that serve as object methods. This is based on the idea that an "object" is basically a piece of state with some attached (or associated) programs (methods). For example, a Smalltalk object consists of a set of state variables (data), together with a pointer (link) to a class object which contains its methods. The link between an object and its class is essentially a metadata link, since the class methods are used to help interpret the data. In the Web environment, the idea is that objects can be constructed by enhancing Web resources with additional metadata that allows the resources to be considered as objects in some object model. This concept will be developed further in Section 3, but is mentioned here to further explain the role that metadata structuring principles will play in the development of a Web object model.

2.2.1 Dublin Core

The Dublin Core <http://purl.oclc.org/metadata/dublin_core/> is a set of specific metadata attributes originally developed at the March 1995 Metadata Workshop in Dublin, Ohio. The set has subsequently been modified on the basis of later Dublin Core Metadata Workshops. The goal of the Dublin Core is to define a minimal set of descriptive elements that facilitate the description and the automated indexing of document-like networked objects. The Core metadata set is intended to be suitable for use by resource discovery tools on the Internet, such as the "WebCrawlers" employed by popular World Wide Web search engines (e.g., Lycos and Alta Vista). In addition, the core is meant to be sufficiently simple to be understood and used by the wide range of authors and casual publishers who contribute information to the Internet. The Dublin Core reflects input from a wide range of communities interested in metadata, including both the Internet and Digital Library communities. The elements of the Dublin Core (as of November 1997) are given below. The Dublin Core Reference Description <http://purl.org/metadata/dublin_core_elements> contains the current definition.

TITLE: The name given to the resource by the CREATOR or PUBLISHER.

CREATOR: The person(s) or organization(s) primarily responsible for the intellectual content of the resource.

SUBJECT: Keywords or phrases that describe the subject or content of the resource. The intent is to use controlled vocabularies and keywords, so the element might include scheme-qualified classification data (for example, Library of Congress Classification Numbers) or scheme-qualified controlled vocabularies (such as MEdical Subject Headings).

DESCRIPTION: A textual description of the content of the resource, such as document abstracts or content descriptions of visual resources. This could be extended to include computational content description (e.g., spectral analysis of a visual resource). In this case this field might contain a link to the description rather than the description itself.

PUBLISHER: The entity responsible for making the resource available in its present form.

CONTRIBUTORS: Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource.

DATE: The date the resource was made available in its present form.

TYPE: The category of the resource, such as home page, novel, poem, working paper, etc. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types that is under development. See http://sunsite.berkeley.edu/Metadata/types.html for current thinking on the application of this element.

FORMAT: The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image (as well as non-electronic media). FORMAT will be assigned from an enumerated list that is under development.

IDENTIFIER: String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.

SOURCE: The work, either print or electronic, from which this resource is derived, if applicable.

LANGUAGE: Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with RFC 1766. See: http://ds.internic.net/rfc/rfc1766.txt.

RELATION: Relationship to other resources, for example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development.

COVERAGE: The spatial locations and temporal durations characteristic of the resource. Formal specification of COVERAGE is currently under development.

RIGHTS: A link (e.g., a URL or other suitable URI as appropriate) to terms and conditions, copyright statements, or similar information. A formal specification is currently under development.

In addition to enumerating these data elements, the Dublin Workshop report specified a number of underlying principles that apply to the entire core metadata set.

The core metadata set should be extensible to permit site specific or domain specific data elements .
All elements in the Core metadata set should be optional.
All elements should be repeatable allowing, for example, multiple author elements.
The semantics of each element should be be modifiable by either:

the use of qualifiers, borrowed from other existing metadata schemes, which allow the use of more detailed or specific semantics from those schemes. For example, a Subject element might be specified as Subject (scheme=LCSH), indicating that the subject terms are taken from the Library of Congress Subject Headings.
ad-hoc specializations and extensions developed specifically for use with the Core so as to refine the normal meanings of the core data elements.

These principles illustrate a number of requirements in a general metadata model, including:

the need for structural flexibility, e.g., for repeating or missing elements, and for adding local extensions
the need to be able to refer to additional levels of metadata. This is illustrated here by the example Subject (scheme=LCSH), which identifies the source definition of the Subject attribute name, and constitutes metadata about the metadata (the subject information) being recorded for a particular document. In this case, the reference to the additional level of metadata is by name (LCSH). Later in the paper, there will be examples where these references are via explicit metalevel pointers (e.g., URLs) that link a metadata element directly to its definition. These capabilities allow instances of metadata to refer to specific ontologies, where these are defined.
the need to be able to provide metadata about metadata at multiple levels of granularity. For example, Subject (scheme=LCSH) illustrates the need to associate additional metadata with individual attribute names. This permits, for example, the use of attribute names whose definitions come from different sources.

These same principles are illustrated in a number of the specific metadata models described later in this section, such as MCF and RDF.

2.2.2 Warwick Framework

The Warwick Framework <http://cs-tr.cs.cornell.edu:80/Dienst/UI/2.0/Describe/ncstrl.cornell/TR96-1593> defines a container architecture that builds on the Dublin Core results. It is a mechanism for aggregating distinct packages of metadata, allowing multiple, separately-managed metadata sets to be defined, managed, and associated with the resources they describe. The report also describes proposals for representing Warwick Framework structures using HTML, MIME, SGML, and a distributed object architecture. (See also the overview papers at http://www.dlib.org/dlib/july96/07weibel.html and http://www.dlib.org/dlib/july96/lagoze/07lagoze.html.)

The Warwick Framework has two fundamental components: packages, which are typed metadata sets, and containers, which are the units for aggregating packages.

A container may be either transient or persistent. In its transient form, it exists as a transport object between and among repositories, clients, and agents. In its persistent form, it exists as a first-class object in the information infrastructure. That is, it is stored on one or more servers and is accessible from these servers using a globally accessible identifier (URI). A container may also be wrapped within another object (i.e., one that is a wrapper for both data and metadata). In this case the "wrapper" object will have a URI rather than, or in addition to, the metadata container itself.

Independent of the implementation, the only operation defined for a container is one that returns a sequence of packages in the container. There is no provision in this operation for ordering the members of this sequence and thus no way for a client to assume that one package is more significant or "better" than another.

Each package is a typed object; its type may be determined after access by a client or agent. Packages are of three types:

metadata set - These are packages that contain actual metadata. Examples are packages that are MARC records, Dublin Core records, and encoded terms and conditions (MARC stands for MAchine Readable Catalog, a standard for the representation and communication of bibliographic and related information in machine-readable form, used extensively in the Library community--see, e.g., http://lcweb.loc.gov/marc/). A potential problem is the ability of clients and agents to recognize and process the semantics of the many metadata sets. In addition, clients and agents will need to adapt to new metadata types as they are introduced, at least to the extent of ignoring them gracefully, or perhaps copying them for downstream applications that may know how to process them. Initial implementations of the Warwick Framework will probably include a set of well known metadata sets, in the same manner that most Web browsers have native handlers for a set of well-known MIME types. Extending the Framework implementations to handle an extensible metadata sets is expected to rely on a type registry scheme.
indirect - This is a package that is an indirect reference to another object in the information infrastructure. While the indirection could be done using URLs, the existence of a reliable URN implementation is necessary to avoid the problems of dangling references that currently exist in the Web. It is important to note that the target of the indirect package is a first-class object, and thus may have its own metadata and, significantly, its own terms and conditions for access. Further, the target of the indirect package may also be indirectly referenced by other containers (i.e., sharing of metadata objects). Finally, the target of the indirection may be in a different repository or server than the container that references it.
container - This is a package that is itself a container. There is no defined limit for this recursion.

+--------------------+
|      container     |
|                    |
|  +---------------+ |
|  |   package     | |
|  | (Dublin Core) | |
|  +---------------+ |
|  +---------------+ |
|  |   package     | |
|  | (MARC Record) | |
|  +---------------+ |       +------------------------+
|  +---------------+ | URI   |       package          |
|  |   package     |-+------>| (terms and conditions) |
|  |  (indirect)   | |       +------------------------+
|  +---------------+ |
+--------------------+

Figure 1- Metadata container with three packages (one indirect)

Figure 1 illustrates a simple example of a Warwick Framework container. The container in this example contains three logical packages of metadata. The first two, a Dublin Core record and a MARC record, are contained within the container as a pair of packages. The third metadata set, which defines the terms and conditions for access to the content object, is referenced indirectly via a URI in the container (the syntax for terms and conditions metadata and administrative metadata is not yet defined).

The mechanisms for associating a Warwick Framework container with a content object (i.e., a document) depend on the implementation of the Framework. The proposed implementations discussed in the cited reference illustrate some of the options. For example, a simple Warwick Framework container may be embedded in a document, as illustrated in the HTML implementation proposal; or an HTML document can include a link to a container stored as a separate file. On the other hand, as illustrated in the distributed object proposal, a container may be a logical component of a so-called digital object, which is a data structure for representing networked objects.

The reverse linkage, which ties a container to a piece of intellectual content, is also relevant, since anyone can create descriptive data for a networked resource, without permission or knowledge of the owner or manager of that resource. This metadata is fundamentally different from the metadata that the owner of a resource chooses to link to or embed with the resource. As a result, an informal distinction is made between two categories of metadata containers, which both have the same implementation:

An internally-referenced metadata container is the metadata that the author or maintainer of a content object has selected as describing the object. This metadata is associated with the content by either embedding it as part of the structure that holds the content or referencing it via a URI. An internally-referenced metadata container referenced via a URI is, by nature, a first-class networked object, and may have its own metadata container associated with it. In addition, an internally-referenced metadata container may back-reference the content that it describes via a linkage metadata element within the container.
An externally-referenced metadata container is metadata that may be created and maintained by an authority separate from the creator or maintainer of the content object. In fact, the creator of the object may not even be aware of this metadata. There may an unlimited number of such externally-referenced metadata containers. For example, libraries, indexing services, ratings services, and the like may compose sets of metadata for content objects that exist on the net. As stated earlier, these externally-referenced metadata containers are themselves first-class network objects, accessible through a URI and having some associated metadata. The linkage to the content that one of these externally-referenced containers purports to describe will be via a linkage metadata element within the container. There is no requirement, nor is it expected, that the content object will reference these externally-referenced containers in any way.

One of the motivations for the development of the Warwick Framework was a recognition that, even if attention is restricted to metadata for descriptive cataloging (the subject of the Dublin Core), many different formats for such metadata have been defined (including specialized forms for particular kinds of data, such as geospatial data), and techniques must be defined for organizing the metadata about an object that may appear in these multiple forms.

Another motivation was the recognition that there are many other kinds of metadata besides that used for descriptive cataloging that may need to be recorded and organized. These kinds of metadata include, among others:

terms and conditions - metadata that describes the rules for use of an object, such as an access list of who can view the object, a set of prices and fees for use of the object, or a definition of permitted uses of an object (viewing, printing, copying, etc.).
administrative data - metadata that relates to the management of an object in a particular server or repository, such as date of last modification, or the administrator's identity.
content ratings - a description of attributes of an object within a multidimensional scaled rating scheme as assigned by some rating authority, e.g., using the PICS mechanism.
provenance - data defining source or origin of some content object, for example describing some physical artifact from which the content was scanned, a summary of algorithmic transformations that have been applied to the object (filtering, decimation, etc.).
linkage or relationship data - data describing relationships to other objects, such as the relationship of a set of journal articles to the containing journal, the relationship of a translation to the work in its original language, or the relationships among the components of a multimedia work (including synchronization information between images and a soundtrack, for example). Relationships should be defined using some unique persistent identifier such as an ISBN, ISSN, or URN.
structural data - data defining the logical components of complex or compound objects and how to access those components, such as a table of contents, or the definition of the different source files, subroutines, data definitions in a software suite.

The Warwick Framework illustrates a number of very basic structural requirements and options that must be supported in representing metadata, and linking it with the data it describes. Like the principles reflected in the Dublin Core, the Warwick Framework principles are illustrated in a number of the specific metadata models described later in this section, such as the RDF. For example, RDF assertions (see below) correspond closely to Warwick Framework packages, and the various means provided for associating RDF assertions with the resources they describe support options identified in the Warwick Framework.

2.2.3 PICS and PICS-NG

PICS (Platform for Internet Content Selection) <http://www.w3.org/PICS/> is an infrastructure for associating labels (metadata) with Internet content. It was originally designed to help parents and teachers control what children access on the Internet, but it also facilitates other uses for labels, including code signing, privacy, and intellectual property rights management. PICS currently defines the following recommendations:

Rating Services and Rating Systems (and Their Machine Readable Descriptions) <http://www.w3.org/TR/REC-PICS-services> (earlier version in World Wide Web Journal, 1(4), Fall 1996, 23-43) defines a language for describing rating services and systems. Software programs will read service descriptions written in this language, in order to interpret content labels and assist end-users in configuring selection software.
PICS Label Distribution -- Label Syntax and Communication Protocols <http://www.w3.org/TR/REC-PICS-labels> (earlier version in World Wide Web Journal, 1(4), Fall 1996, 45-69) specifies the syntax and semantics of content labels and HTTP-related protocol(s) for distributing labels as part of PICS.
PICSRules 1.1 <http://www.w3.org/TR/REC-PICSRules> defines a language for writing profiles, which are filtering rules that allow or block access to URLs based on PICS labels that describe those URLs.

In PICS, a rating service is an individual or organization that provides content labels for resources on the Internet. The labels it provides are based on a rating system. Each rating service must describe itself using a PICS-defined MIME type application/pics-service. Selection software that relies on ratings from a PICS rating service can first load the application/pics-service description. This description allows the software to tailor its user interface to reflect the details of a particular rating service.

Each rating service picks a URL as its unique identifier, and includes this unique identifier in all content labels the service produces. It is intended that the URL, in addition to simply being a unique identifier, also refer to an HTML document which describes both the rating service, but also the rating system used by the service (possibly via a link to a separate document).

A rating system specifies the dimensions used for labeling, the scale of allowable values for each dimension, and a description of the criteria used in assigning values. For example, the MPAA rates movies in the U.S. based on a single dimension with allowable values G, PG, PG-13, R, and NC-17. The current PICS specification allows only floating point values.

Each rating system is identified by a URL. This allows multiple services to use the same rating system, and refer to it by its identifier. The URL identifying a rating system can be accessed to obtain a human-readable description of the rating system.

A content label, or rating, contains information about a document. The format of a content label is defined in the Label Format document referenced above, and has three parts:

the URL identifying the rating service that produced the label
a set of PICS-defined (and extensible) attribute-value pairs which provide information about the rating, such as the date the rating was assigned.
a set of rating-system-defined attribute-value pairs that actually rate the document along various dimensions or categories (chosen by the rating system)

A new MIME type application/pics-labels is also defined for transmitting one or more content labels.

When an end-user attempts to access a particular URL, a software filter built into the Web client (browser) fetches the document. The client also accesses the document's content label(s) based on rating systems that the client has been told to pay attention to. The client then compares the content label to the rating-system-specified values that the client has been told to base access decisions on, and either allows or denies access to the document.

Content labels may be:

embedded in the document (using a PICS-specified mechanism based on the HTML META tag)
located separately from the document on the same server, and retrieved along with the document via a protocol that uses RFC-822 headers
retrieved separately from the document from a "label bureau"

The following application/pics-service document (taken from the PICS specification) describes a simple rating service.

((PICS-version 1.1)
 (rating-system "http://www.gcf.org/ratings")
 (rating-service "http://www.gcf.org/v1.0/")
 (icon "icons/gcf.gif")
 (name "The Good Clean Fun Rating System")
 (description "Everything you ever wanted to know about soap,
cleaners, and related products")

 (category
  (transmit-as "suds")
  (name "Soapsuds Index")
  (min 0.0)
  (max 1.0))

 (category
  (transmit-as "density")
  (name "suds density")
  (label (name "none") (value 0) (icon "icons/none.gif"))
  (label (name "lots") (value 1) (icon "icons/lots.gif")))

 (category
  (transmit-as "subject")
  (name "document subject")
  (multivalue true)
  (unordered true)
  (label (name "soap") (value 0))
  (label (name "water") (value 1))
  (label (name "soapdish") (value 2))
  (label-only))

 (category)
  (transmit-as "color")
  (name "picture color")
  (integer)

  (category
   (transmit-as "hue")
   (label (name "blue")  (value 0))
   (label (name "red")   (value 1))
   (label (name "green") (value 2)))

  (category
   (transmit-as "intensity")
   (min 0)
   (max 255))))

There are four top-level categories in this rating system. Each category has a short transmission name to be used in labels (e.g., "suds"); some also have longer names that are more easily understood (e.g., "Soapsuds Index"). The "Soapsuds Index" category rates soapsuds on a scale between 0.0 and 1.0 inclusive. The "suds density" category can have ratings from negative to positive infinity, but there are two values that have names and icons associated with them. The name "none" is the same as 0, and "lots" is the same as 1. The "document subject" category only allows the values 0, 1, and 2, but a single document can have any combination of these values. The "picture color" category has two sub-categories.

A label list is used to transmit a set of PICS labels. The following is a label list for two documents rated using the above rating system.

(PICS-1.1 "http://www.gcf.org/v2.5"
  by "John Doe"
  labels on "1994.11.05T08:15-0500"
         until "1995.12.31T23:59-0000"
         for "http://www.w3.org/PICS/Overview.html"
         ratings (suds 0.5 density 0 color/hue 1)
         for "http://www.w3.org/PICS/Underview.html"
         by "Jane Doe"
         ratings (subject 2 density 1 color/hue 1))

PICS-NG (Next Generation) was a W3C effort based on the observation that the PICS infrastructure could be generalized to support arbitrary Web metadata, with PICS categories serving as metadata attributes, having meanings defined by the rating system. The W3C paper Catalogs: Resource Description and Discovery <http://www.w3.org/pub/WWW/Search/catalogs.html> also observes that the structure of a PICS label is similar to:

a row in a relational database table (a rating system is analogous to the schema)
the set of header names and values in an email message (the rating system in this case is RFC822)
an SOIF record
a BibTeX bibliography entry
an HTML form data set

The PICS-NG effort defined a Metadata Object Model, and its encodings in XML and as S-expressions, in the note PICS-NG Metadata Model and Label Syntax <http://www.w3.org/TR/NOTE-pics-ng-metadata>. This model includes a number of extensions to the original PICS representation scheme, in order to support more general forms of metadata. These extensions include such things as:

additional primitive types such as strings, URLs, lists, and symbols (sequences of characters acting as unique identifiers)
the ability for attributes to describe objects, things referred to by objects, and other attributes
an inheritance mechanism to allow collections of metadata to be shared and reused
the ability to define schemata that define the meanings and other metadata about attributes (the draft notes that the mechanism for doing this may take on features of metaobject protocols)
a metamodel for the model itself, including a small set of attributes that are available in any label

Other papers related to this effort include:

an extension of PICS to support text-based metadata based on the Dublin Core <http://www.dstc.edu.au/RDU/PICS/proposal03.html> (March 24, 1997)
related documents for supporting text-based metadata, but updated to use W3C's Resource Description Framework instead of PICS <http://www.dstc.edu.au/RDU/PICS/>
PICS-SE: A Proposal for the Annotation of Internet Documents using a String Extension to PICS <http://www.kulturbox.de/aid/pics-se/>

The PICS-NG effort has been merged with other work to become W3C's Resource Description Framework activity (see Section 2.2.6).

PICS illustrates a number of important ideas in data modeling and metadata representation. One such idea is the definition of specific required data items (e.g., category, label) having predefined meanings in the model. Such specifications are important in supporting interoperability among applications that use PICS ratings. PICS also illustrates the use of metalevel pointers. The URLs that identify rating services and rating systems in PICS point to information that describes PICS metadata (i.e., to metametadata). These illustrate the idea that a given piece of data on the Web, no matter what its intended purpose (e.g., whether it is intended to represent data or metadata), can itself point to (or be related in some other way to) data that can be used to help interpret it. Finally, PICS illustrates the use of a metalevel (or reflective) architecture. PICS requires that ordinary requests for data on the Web be interrupted or intercepted, so that rating information about the requested resource can be retrieved, and a decision made about whether to return the requested data or not. This same basic idea can be used to enhance individual requests with other types of additional processing, often transparently to users. For example, such processing could be used to bracket a collection of individual requests to form a database-like transaction, by adding interactions with a transaction processor to these requests. Examples of such processing are described in [CM93, Man93, SW96]. These same ideas are the basis for current OBJS work on an Intermediary Architecture <http://www.objs.com/workshops/ws9801/papers/paper103.html> for the Web.

As illustrated by the existence of a PICS-NG effort, PICS itself requires extensions to deal with more general metadata requirements. Some of these are described further in the discussion of the Resource Description Framework (Section 2.2.6). In addition, in order to provide a complete Web object model, PICS and similar ideas must be augmented with an API providing applications with easy access to the state, and with mechanisms to link code to the state represented using models such as PICS. These aspects will be discussed in subsequent sections.

2.2.4 XML-Data

XML-Data <http://www.w3.org/TR/1998/NOTE-XML-data/> is a submission to W3C by Microsoft, ArborText, DataChannel, and INSO. XML-Data defines an XML vocabulary for schemas, that is, for defining and documenting object classes. It can be used either for classes which are strictly syntactic (for example, XML), or which indicate concepts and relations among concepts (as used in relational databases, knowledge representation graphs, and RDF). The former are called "syntactic schemas;" the latter "conceptual schemas."

For example, an XML document might contain a "book" element which lexically contains an "author" element and a "title" element. An XML-Data schema can describe such syntax. However, in another context, it may simply be necessary to represent more abstractly that books have titles and authors, irrespective of any syntax. XML-Data schemas can also describe such conceptual relationships. Further, the information about books, titles and authors might be stored in a relational database, in which case XML-Data schemas can describe the database row types and key relationships.

One immediate implication of the ideas in XML-Data is that, using XML-Data, XML document types can be described using XML itself, rather than DTD syntax. Another is that XML-Data schemas provide a common vocabulary for ideas which overlap between syntactic, database and conceptual schemas. All features can be used together as appropriate.

Schemas in XML-Data are composed principally of declarations for:

Concepts
Classes of objects

Class hierarchies
Properties
Constraints
Relationships

Indicated by primary key to foreign key matching
Indicated by URI

XML DTD Grammars and Compatibility

grammatical rules governing the valid nesting of the elements and attributes
attributes of elements
internal and external entities
notations

Data types giving parsing rules and implementation formats.
Mapping rules allowing abbreviated grammars to map to a conceptual data model.

The following simple example taken from the XML-Data submission shows some data about books and authors, and the XML-Data schema which describes it (note the use of the XML Namespace facility, described in Section 2.1.4, for qualifying names).

Some data:

<?xml:namespace name="http://company.com/schemas/books/" as="bk"/>
<?xml:namespace name="http://www.ecom.org/schemas/dc/" as="ecom" ?>

<bk:booksAndAuthors>
    <Person>
        <name>Henry Ford</name>
        <birthday>1863</birthday>
    </Person>

    <Person>
        <name>Harvey S. Firestone</name>
    </Person>

    <Person>
        <name>Samuel Crowther</name>
    </Person>

    <Book>
        <author>Henry Ford</author>
        <author>Samuel Crowther</author>
        <title>My Life and Work</title>
    </Book>

    <Book>
        <author>Harvey S. Firestone</author>
        <author>Samuel Crowther</author>
        <title>Men and Rubber</title>
        <ecom:price>23.95</ecom:price>
    </Book>
</bk:booksAndAuthors>

The schema for http://company.com/schemas/books:

<?xml:namespace name="urn:uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882/" as="s"/?>
<?xml:namespace href="http://www.ecom.org/schemas/ecom/" as="ecom" ?>

<s:schema>

    <elementType id="name">
        <string/>
    </elementType>

    <elementType id="birthday">
        <string/>
        <dataType dt="date.ISO8601"/>
    </elementType>

    <elementType id="Person">
        <element type="#name" id="p1"/>
        <element type="#birthday" occurs="OPTIONAL">
            <min>1700-01-01</min><max>2100-01-01</max>
        </element>
        <key id="k1"><keyPart href="#p1" /></key>
    </elementType>

    <elementType id="author">
        <string/>
        <domain type="#Book"/>
        <foreignKey range="#Person" key="#k1"/>
    </elementType>

    <elementType id="writtenWork">
        <element type="#author" occurs="ONEORMORE"/>
    </elementType>

    <elementType id="Book" >
        <genus type="#writtenWork"/>
        <superType href=" http://www.ecom.org/schemas/ecom/commercialItem"/>
        <superType href=" http://www.ecom.org/schemas/ecom/inventoryItem"/>
        <group groupOrder="SEQ" occurs="OPTIONAL">
            <element type="#preface"/>
            <element type="#introduction"/>
        </group>
        <element href="http://www.ecom.org/schemas/ecom/price"/>
        <element href="ecom:quantityOnHand"/>
    </elementType>

    <elementTypeEquivalent id="livre" type="#Book"/>
    <elementTypeEquivalent id="auteur" type="#author"/>

</s:schema>

While this example does not illustrate all of the capabilities of XML-Data, it does illustrate the capabilities of declaring such things as:

the names and data types of data elements and groups
required or optional data elements
constraints on values (e.g., minimum and maximum)
data elements which act as keys
referential integrity constraints between keys in one group and foreign keys in another
class hierarchies (supertype relationships)
mixing declarations from multiple schemas

The submission should be referenced for further details and additional examples.

XML-Data is another example of a higher-level model built using XML as its representation. It is not yet clear how the overlap in metadata capabilities between such representations as DTDs, RDF, and XML-Data will work out. The XML-Data approach may prove to be better than DTDs in supporting some types of processing, such as database-like operations, since it makes no distinctions between data and metadata representations. Like the other data models described in this section, XML-Data is not sufficient to form a complete Web object model. In particular, it requires integration with an API facility and a mechanism to access associated code.

2.2.5 Meta Content Framework (MCF)

Netscape's Meta Content Framework (MCF) <http://www.w3.org/TR/NOTE-MCF-XML/> [GB97] is a proposal for a metadata model based on the increasing need for machine-readable descriptions of distributed information. MCF is based on the following principles:

There is no useful distinction between data and metadata; these are simply roles that data may play with respect to some application or requirement, and hence there should be no special syntax reserved just for "metadata"
For interoperability and efficiency, descriptive information should share a common data model and vocabulary (e.g., attribute set) as much as possible

The latter point is particularly important. If all applications save their data in XML format, this would be somewhat more open than the use of proprietary formats, since any application could access the resulting documents. However, in order for applications to meaningfully process those documents, it would be necessary for the applications to recognize the various labels and structures used in those documents, and their associated semantics. Agreements on data models and vocabularies allow this sort of mutual recognition of labels and structure among applications, thus supporting interoperability.

MCF is essentially a structure description language. The basic information structure used is the Directed Labeled Graph (DLG). An MCF database is a set of DLGs, consisting of:

a set of labels (property types)
a set of nodes
a set of arcs, where each arc is a triple consisting of two nodes (the origin and destination) and a label. Arcs are also referred to as properties.

Nodes represent things like web pages, images, subject categories, and sites. The labels are nodes that correspond to properties such as size or lastRevisionDate used to describe web pages, subject categories, etc., and also to define relations such as hyperlinks, authorship, or parenthood, between these things.

Each label/property type, such as pageSize, is a node (but not all nodes are property types). Since labels are nodes, they can participate in relationships that, e.g., define its semantics. For example, a pageSize node could have properties that specify its domain (e.g., Document), its range (sizeInBytes), that a Document has only one pageSize, and that provide human-readable documentation of the intended semantics.

An MCF node can be either a primitive data type or a "Unit". The primitive data types are the same as the Java primitive types. In addition, a DATE type should be supported by the low-level MCF machinery. The concept of a "Unit" corresponds loosely to the Java concept of "Object".

MCF defines a small set of units with predefined semantics in order to "bootstrap" the type system. These include, among others:

typeOf: the PropertyType used to specify that a given object is of a certain type. Every unit has a typeOf property
Category: corresponds to an object Class. The destination of a typeOf property has a typeOf property which ends at Category
Unit: the most general Category
PropertyType: the typeOf all properties
superType: a PropertyType used to indicate that one Category is the superType of another (i.e., MCF supports type/subtype hierarchies as found in many object models).

MCF recognizes that, for purposes of interoperability, it would be good to standardize the vocabulary for commonly-used terms. [GB97] proposes some items for this vocabulary (largely derived from existing standards such as the Dublin Core) for describing Web content. [GB97] also defines an XML-based syntax for representing MCF. This essentially defines a type system for XML.

Like PICS, MCF illustrates a number of important ideas in data modeling and metadata representation. For example, MCF illustrates both the use of specific required data items having predefined meanings in the model, and metalevel pointers. Unlike PICS, MCF represents a data model that can be used for more general purposes than content labeling. For example, it includes a type hierarchy, a richer set of base types, and other aspects of a full data model. In addition to required data items representing aspects of the model structure, the MCF reference identifies a list of suggestions for standard application-specific item names borrowed from the Dublin Core and elsewhere. MCF "units" are similar to the individual elements of the OEM model. Many MCF concepts have been incorporated into W3C's RDF (described in the next section). However, as noted in connection with other models in this section, these concepts must be combined with an API and a mechanism for integrating behavior to provide full object model support.

2.2.6 Resource Description Framework (RDF)

The World Wide Web Consortium's Resource Description Framework (RDF) effort <http://www.w3.org/Metadata/RDF/> is currently developing a mechanism designed for exchanging machine-understandable metadata describing Web resources. This type of metadata can be used, e.g.:

in resource discovery, to provide better search engine capabilities
for cataloging, in describing the content and content relationships available at a particular Web site, page, or digital library
by intelligent software agents, to facilitate knowledge sharing and exchange
in content rating
in describing collections of resources that represent a single logical "document"
in describing intellectual property rights
combined with digital signatures, in electronic commerce, collaboration, and similar applications

The work combines extensions of the PICS technology to support more general metadata requirements with work on metadata models such as Netscape's Meta Content Framework (MCF) and Microsoft's Web Collections [Hop97]. The current RDF draft specification <http://www.w3.org/TR/WD-rdf-syntax/> defines both a data model for representing RDF metadata, and an XML-based syntax for expressing and transporting the metadata.

The basis of RDF is a model for representing named properties and their values. These properties serve both to represent attributes of resources (and in this sense correspond to attribute/value pairs) and to represent relationships between resources. The RDF data model is a syntax-independent way of representing RDF statements.

The core RDF data model is defined in terms of:

a set of Nodes (N)
a set of PropertyTypes (P), a subset of N
a set of 3-tuples T, whose elements are informally known as properties. The first item of each tuple is an element of P, the second item is an element of N and the third item is either an element of N or an atomic value (e.g. a Unicode string).

(thus resembling MCF).

In this data model both the resources being described and the values describing them are nodes in a directed labeled graph (values may themselves be resources). The arcs connecting pairs of nodes correspond to the names of the property types. This is represented pictorially as:

     [resource R] ---propertyType P---> [value V]

and can be read "V is the value of the property P for resource R", or left-to-right; "R has property P with value V". For example the statement "John Smith is the Author of the Web page "http://www.bar.com/some.doc" would be represented as:

   [http://www.bar.com/some.doc] ---author---> "John Smith"

where the notation [URI] denotes the instance of the resource identified by URI and "..." denotes a simple Unicode string.

According to the above definition, the property "author", i.e. the arc labeled "author" plus its source and target nodes is the triple (3-tuple):

   {author, [http://www.bar.com/some.doc], "John Smith"}

where "author" denotes a node used for labeling this arc. The triple composed of a resource, a property type, and a value is an RDF statement.

A collection of these triples with the same second item is called an assertions. Assertions are particularly useful when describing a number of properties of the same resource. Assertions are diagramed as follows:

[resource R]-+---property P1----> [value Vp1]
             |
             +---property P2----> [value Vp2]

An RDF assertions can be a resource itself and can therefore be described by properties; that is, an assertions can itself be used as the source node of an arc. The name assertions is suggestive of the fact that the properties specified in it are effectively (logical) assertions about the resource being described. This establishes a relationship between RDF and a logic-based interpretation of the data structure which will be further developed in Section 3.

Assertions may be associated with the resource they describe in one of four ways:

the assertions may be contained within the resource (embedded)
the assertions may be external to the resource but supplied by the transfer mechanism in the same retrieval transaction as that which returns the resource (along-with)
the assertions may be retrieved independently from the resource, including from a different source (service bureau)
the assertions may contain the resource (wrapped)

All resources will not support all association methods (e.g., many resource types will not support embedding).

The set of properties in a given assertions, as well as any characteristics or restrictions of the property values themselves, are defined by one or more schemas. Schemas are identified by a URL. An assertions may contain properties from more than one schema. RDF uses the XML namespace mechanism to associate the schema with the properties in the assertions. The schema URL may be treated merely as an identifier, or it may refer to a machine-readable description of the schema. By definition, an application that understands a particular schema used by an assertions understands the semantics of each of the contained properties. An application that has no knowledge of a particular schema will minimally be able to parse the assertions into the property and property value components, and will be able to transport the assertions intact (e.g., to a cache or to another application).

A human- or machine-readable description of an RDF schema may be accessed through content negotiation by dereferencing the schema URL. If the schema is machine-readable, it may be possible for an application to dynamically learn some of the semantics of the properties named in the schema.

An RDF statement can itself be the target node of an arc (i.e. the value of some other property) or the source node of an arc (i.e. it can have properties). In these cases, the original property (i.e., the statement) must be reified; that is, converted into nodes and arcs. RDF defines a reification mechanism for doing this. Reified properties are drawn as a single node with several arcs emanating from it representing the resource, property name, and value:

   [property P1]-+---PropName---> ["name"]
                 |
                 +---PropObj----> [resource R]
                 |
                 +---PropValue--> [value Vp1]

This allows RDF to be used to make statements about other statements; for example, the statement "Joe believes that the document 'The Origin of Species' was authored by Charles Darwin" would be diagramed as:

[Joe]--believes-->[stmnt1]+--InstanceOf-> RDF:Property
                          |
                          +--PropName->"author"
                          |
                          +--PropObj->[http://loc.gov/Books/Species]
                          |
                          +--PropValue->"Charles Darwin"

To help in reifying properties, RDF defines the InstanceOf relation (property) to provide primitive typing, as shown in the example.

To reify a property, all that is done is to add to the data model an additional node (with a generated label) and the three triples with first items (or arcs with labels) using the predefined names RDF:PropName, RDF:PropObj, and RDF:PropValue respectively, second item the generated node label, and third item the corresponding property type, resource node, and value node respectively. In the above example, the three added triples would be:

   {PropName, stmnt1, "author"}
   {PropObj, stmnt1, [http://loc.gov/Books/Species]}
   {PropValue, stmnt1, "Charles Darwin"}

(The use of the "RDF:" prefix in names illustrates the use of the XML namespace mechanism to qualify names to indicate the schema in which they are defined.)

Frequently it is necessary to create a collection of nodes; e.g. to state that a property has multiple values. RDF defines three kinds of collections: ordered lists of nodes, called sequences, unordered lists of nodes, called bags, and lists that represent alternatives for the (single) value of a property, called alternatives. To create collections of nodes, a new node is created that is an RDF:InstanceOf one of the three node types RDF:Seq, RDF:Bag, or RDF:Alternatives. The remaining arcs from that new node point to each of the members of the collection and are uniquely labeled using the elements from Ord. For the RDF:Alternatives, there must be at least one member whose arc label is RDF:1, and that is the default value for the Alternatives node.

The RDF data model provides an abstract, conceptual framework for defining and using metadata. A concrete syntax is also needed for the purpose of authoring and exchanging this metadata. The syntax does not add to the model, and APIs could be provided to manipulate RDF metadata without reference to a concrete syntax. RDF uses XML encoding as its syntax. However, RDF does not require an XML DTD for the contents of assertion blocks (and RDF schemas are not required to be XML DTDs). In this respect, RDF requires at most that its XML representations be well-formed.

RDF defines several XML elements for its XML encoding. The RDF:serialization element is a simple wrapper that marks the boundaries in an XML document, where the content is explicitly intended to be mappable into an RDF data model instance. RDF:assertions and RDF:resource contain the remaining elements that instantiate properties in the model instance. Each XML element E contained by an RDF:assertions or an RDF:resource results in the creation of a property (a triple that is an element of the formal set T defined earlier).

With these basic principles defined, directed graph models of arbitrary complexity can be constructed and exchanged. A simple example would be "John Smith is the Author of the document whose URL is http://www.bar.com/some.doc" (all these examples are taken from the RDF paper cited above, but updated to use more recent XML namespace syntax). This assertion can be modeled with the directed graph:

   [http://www.bar.com/some.doc] ---bib:author---> "John Smith"

(This report uses a notation where Nodes are represented by items in square brackets, arcs are represented as arrows, and strings are represented by quoted items.) This small graph can be exchanged in the serialization syntax as:

<?xml:namespace name="http://docs.r.us.com/bibliography-info" as="bib"?>
<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<RDF:serialization>
  <RDF:assertions href="http://www.bar.com/some.doc">
    <bib:author>John Smith</bib:author>
  </RDF:assertions>
</RDF:serialization>

This example illustrates how the resource, property name, and value are translated into XML.

A more elaborate model could be created in order to say additional things about John Smith, such as his contact information, as in the model:

[http://www.bar.com/some.doc]
      |
   bib:author
      |
      V
[John Smith]-+---bib:name----> "John Smith"
             |
             +---bib:email----> "john@smith.com"
             |
             +---bib:phone----> "+1 (555) 123-4567"

which could be exchanged using the XML serialization representation:

<?xml:namespace name="http://docs.r.us.com/bibliography-info" as="bib"?>
<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<RDF:serialization>
  <RDF:assertions href="http://www.bar.com/some.doc">
    <bib:author>
      <RDF:resource>
        <bib:name>John Smith</bib:name>
        <bib:email>john@smith.com</bib:email>
        <bib:phone>+1 (555) 123-4567</bib:phone>
      </RDF:resource>
    </bib:author>
  </RDF:assertions>
</RDF:serialization>

The serialization above is equivalent to this second serialization:

<?xml:namespace name="http://docs.r.us.com/bibliography-info" as="bib"?>
<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<RDF:serialization>
  <RDF:assertions href="http://www.bar.com/some.doc">
    <bib:author href="#John_Smith"/>
  </RDF:assertions>
</RDF:serialization>

<RDF:resource id="John_Smith">
  <bib:name>John Smith</bib:name>
  <bib:email>john@smith.com</bib:email>
  <bib:phone>+1 (555) 123-4567</bib:phone>
</RDF:resource>

In these representations, the RDF:resource element creates an in-line resource. Typically such a resource will be a surrogate, or proxy, for some other real resource that does not have a recognizable URI. The id= attribute in the second representation provides a name for the resource element so that the resource may be referred to elsewhere.

As an example of making a statement about a statement, consider the case of computing a digital signature on an RDF assertion. (It is assumed that the signature is computed over a concrete XML representation of the assertion rather than over an internal representation. The figure below shows a box containing a small graph. This is a convention to indicate that the XML content whose ID is foo is a concrete representation of the graph it contains.) What is to be specified in the model is expressed by the pair of graphs below - that there is an XML encoding of some assertion, and that there is some other XML content that is a digital signature over that encoding.

+---------------------------------------------------------------+
| ID=foo                                                        |
|                                                               |
|  [http://www.bar.com/some.doc] ---DC:creator---> "John Smith" |
|                                                               |
+---------------------------------------------------------------+

[foo]------DSIG:Signature------>"AKGJOERGHJWEJ348GH4GHEIGH4ROI4"

The details could be expressed in the model below:

       "AKGJOERGHJWEJ348GH4GHEIGH4ROI4"<--RDF:PropValue----+
                                                           |
                     [DSIG:Signature]<----RDF:PropName-----+
                                                           |
      +--RDF:InstanceOf-->[RDF:Property]<--RDF:InstanceOf--+
      |                                                    |
      |                                                    |
    [foo]<----------------RDF:PropObj-----------------[prop-001]
      |
      |
      +---------------------------------------------+
      |                                             |
      +-----------------------------+               |
      |                             |               |
   RDF:PropObj                  RDF:PropName   RDF:PropValue
      |                             |               |
      V                             V               V
[http://www.bar.com/some.doc] ---DC:creator---> "John Smith"

These models could also be expressed as:

<?xml:namespace name="http://purl.org/DublinCore/RDFschema" as="DC"?>
<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<?xml:namespace name="http://www.w3.org/schemas/DSig-schema" as="DSIG"?>
<RDF:serialization>
  <RDF:assertions href="http://www.bar.com/some.doc" id="foo">
    <DC:Creator>John Smith</DC:Creator>
  </RDF:assertions>
  <RDF:assertions href="#foo">
    <DSIG:Signature>AKGJOERGHJWEJ348GH4HGEIGH4ROI4</DSIG:Signature>
  </RDF:assertions>
</RDF:serialization>

(Note that node labels such as "RDF:Property" are shorthand for a full URI such as "http://www.w3.org/schemas/rdf-schema#Property").

The RDF data model intrinsically only supports binary relations. However, higher arity relations can also be represented, using just binary relations. As an example, consider the subject of one of John Smith's recent articles - library science. The Dewey Decimal Code for library science could be used to categorize that article. While the numeric code is the true Dewey value, few people can understand those codes. Therefore, the description of the Dewey categories has been translated into several different languages. In fact, Dewey Decimal codes are far from the only subject categorization scheme. So, it might be desirable to define a "Subject" node that not only specified the subject of a paper, but also indicated the language and categorization scheme it came from. That might look like:

[http://www.webnuts.net/Jan97.html]
      |
   DC:subject
      |
      V
[subject_001]-+---DC:scheme----> "Dewey Decimal Code"
              |
              +---DC:lang----> "English"
              |
              +---RDF:PropValue----> "020 - Library Science"

which could be exchanged as:

<?xml:namespace name="http://purl.org/DublinCore/RDFschema" as="DC"?>
<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<RDF:serialization>
  <RDF:assertions href="http://www.webnuts.net/Jan97.html">
    <DC:subject>
      <RDF:resource id="subject_001">
        <DC:scheme>Dewey Decimal Code</DC:scheme>
        <DC:lang>English</DC:lang>
        <RDF:PropValue>020 - Library Science</RDF:PropValue>
      </RDF:resource>
    </DC:subject>
  </RDF:assertions>
</RDF:serialization>

A common use of this higher-arity capability is when dealing with units of measure. A person's weight is not just a number like 94, it also requires specification of the units on that number. In this case either pounds or kilograms might be used. A relationship with an additional arc might be used to record the fact that John Smith is a rather strapping gentleman:

                                          +--NIST:units--> "pounds"
                                          |
[John Smith]--NIST:weight-->[weight_001]-+
                                          |
                                          +--RDF:PropValue--> "200"

which can be exchanged as:

<?xml:namespace name="http://www.nist.gov/RDFschema" as="NIST"?>
<?xml:namespace name="http://www.w3.org/schemas/rdf-schema" as="RDF"?>
<RDF:serialization>
  <RDF:assertions href="John_Smith">
    <NIST:weight>
      <RDF:resource id="weight_001">
        <NIST:units href="#pounds"/>
        <RDF:PropValue>200</RDF:PropValue>
      </RDF:resource>
    </NIST:weight>
  </RDF:assertions>
</RDF:serialization>

assuming the node "pounds" was defined elsewhere.

The RDF effort is attempting to define a very general abstract metadata architecture and associated support facilities. RDF, like MCF, illustrates how a higher level model can be used together with XML to support specific types of application requirements, and illustrates a number of the same metadata modeling ideas as MCF. The RDF examples above specifically illustrate a requirement for metalevel pointers to explicitly link tags to attribute definitions (by an explicit pointer, not by looking up the name in a dictionary). The more powerful facilities of XML for defining hyperlinks will improve the ability to define very general relationships between data and metadata that describes (and can help interpret) it. For example, the advanced XML linking facilities defined in XLL would allow assertions to refer to parts of referenced documents. It seems likely that RDF will also investigate mechanisms to automatically provide access to RDF metadata at runtime (implementing the various association modes such as along-with), similar to the mechanisms provided by PICS for content labels. In implementing a Web object model, these techniques will be required to gain access to the object methods (which may be either embedded in the Web page, or located as separate resources).

Because of its generality in representing metadata, and the likelihood that it will be the basis of future Web developments in representing metadata, the Web object model described in Section 3 uses RDF (and its XML representation) as part of its structural base (although RDF is currently incomplete, and will be developed further). Additional aspects of MCF may be used as well, depending on more detailed analysis to be performed later. Section 3 will describe further decisions about the nature of the object model, based on RDF as a starting point.

However, RDF and MCF themselves are not sufficient to support all requirements of a Web object model. For example, the object model requires an API to its state representation, and thus RDF and MCF must be integrated with parallel work on a Document Object Model (see below), which is not currently the case. Also, mechanisms for linking code to RDF and MCF structures must be further developed. Finally, structured database capabilities do not exist for these structures, and must be worked out.

2.3 Adding Behavior to Web Pages

Previous sections have noted that what is needed to progress toward a Web object model is:

a richer base representation than HTML, in order to better represent "object state" (in particular, better support for semantic identification of fields, rather than simply supporting presentation aspects of data)
an API to this state, so that programs can readily access it (without complex parsing)
an enhanced ability to define relationships between this state and specified pieces of code that can serve as object methods

Section 2.1 described work toward providing the Web with a richer base representation (e.g., XML). The metadata and model work described in Section 2.2 described approaches for adding additional structure to this representation. In addition, as noted in the introduction to Section 2.2, these techniques for representing metadata and linking it to Web resources provide a conceptual framework for linking behavior to Web resources, by treating the code implementing that behavior as a form of metadata. Code resources are already being stored on the Web, e.g., in program libraries supporting reuse, and it is already possible to create links between Web documents and such resources. However, in using code resources to create objects, it is necessary to reflect the special semantics associated with these links. These semantics somewhat resemble those of metadata such as content labels, in the sense that rather than the user explicitly following the links to retrieve the associated "metadata", some of the "metadata" is automatically retrieved during access to the original resource, in order to support some special processing. In the case of content labels, the special processing involves checking the content labels against user-specified requirements in order to determine whether to allow access to the original resource. In the case of object methods, the special processing involves invoking the retrieved code in order to perform some operation . This particular approach to representing and invoking object methods will be discussed further in Section 3.

This section describes several mechanisms developed within the Web community for defining relationships between state and code, and for providing an API to state (the second and third bullets above). Specifically, techniques developed for embedding objects and scripts in Web documents represents one way of associating behavior with the state represented by a Web document. The W3C's Document Object Model (DOM) effort represents another way of addressing this issue, as well as the issue of providing an API to this state. These two issues are closely related.

A program must gain access to data in order to process it, and so an object method must have access to the object's state. It is always possible to pass data as a value to a program. However, the program must understand the structure of this data in order to access it efficiently. Conventional object models provide what is in effect a special API for object methods to use when accessing state for this purpose. This is also necessarily in a Web object model. However, the need for such an API becomes especially important when the state has a rich, complex structure, such as an XML document. Without an API to this state (and its implementation), each program would have to implement a considerable amount of code simply to parse the structure, in order to locate the parts of the document required for specific purposes. An API providing access to the various parts of a document, together with an implementation of this API as part of the general representation of this state's "data type", provides this code as a pre-existing component, allowing the program to concentrate on application-related processing. The DOM provides such an API. At the same time, it provides part of a general mechanism (albeit a very unconstrained one) for linking code and state, since it provides a straightforward mechanism for code (currently, programs such as plug-ins or external applications) to access the state it needs.

Finally, the Web Interface Definition Language (described in Section 2.3.3) is commercial technology that represents another mechanism for providing an API to state (as well as to Web-based services).

2.3.1 Document Object Model (DOM)

W3C's Document Object Model (DOM) <http://www.w3.org/DOM/> effort provides a mechanism for scripts or programs to access and manipulate parsed HTML and XML content (including all markup and any Document Type Definitions) as a collection of objects. Specifically, DOM defines an object-oriented API of an HTML or XML document that a Web client can present to programs (applications or scripts) that need to process the document. The client (at least conceptually) operates off this collection of objects in displaying the document. Thus, by operating on the collection of objects representing a Web page, scripts or programs can change styles and attributes of page elements, or even replace existing elements with new ones, resulting in an immediate change to the data displayed to the user. As a result, DOM makes it easy to implement dynamic content on the client, rather than forcing all such content to be implemented on the server, and provides a basic way to integrate a document's data with code. For example, a client might implement a JavaScript DOM interface, so that scripts in this language could be used within the page itself to manipulate the page. The client could also provide a DOM interface to external applications such as plug-ins allowing them to access the document via the client. Similarly, an editor might implement a Java DOM interface to allow programs written in Java to interact with the editor to manipulate the page.

DOM is a generalization of Dynamic HTML facilities defined by Microsoft and Netscape. Functionality equivalent to the Dynamic HTML support provided by Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0 is referred to as "DOM level 0". DOM level 1 extends these capabilities to, for example, allow creation "from scratch" of entire Web documents in memory by creating the appropriate objects. The DOM Working Draft specification <http://www.w3.org/TR/WD-DOM> includes level 1 Core specifications which apply to both HTML and XML documents, and level 1 specializations for HTML and XML documents. The DOM object class definitions in these specifications have their interfaces defined using OMG IDL. Java interface specifications are also defined (see the specifications for details).

DOM represents a document as a hierarchy of objects, called nodes, which are derived (by parsing) from a source representation of the document (HTML or XML). The DOM object classes represent generic components of a document, and hence define a document object metamodel. The DOM Level 1 working draft defines a set of object classes (and their inheritance relationships) for representing documents. The major classes are:

Node
  |
  +--Document
  |    |
  |    +--HTMLDocument
  |
  +--Element
  |    |
  |    +--HTMLElement
  |         |
  |         +--specific HTML elements
  |
  +--Attribute
  |
  +--Text
  |
  +--PI [Processing Instruction, an XML concept from SGML]
  |
  +--Comment

The Node object is the base type for all objects in the DOM. It may have an arbitrary number (including zero) of sequentially-ordered child nodes. It usually has a parent Node, the exception being that the root Node in a document tree has no parent.

Element objects represent the elements in HTML and XML documents. Elements contain, as child nodes, all of the content between the start tag and the corresponding end tag of an element. Aside from Text nodes, the vast majority of node types that applications will encounter when traversing a document structure will be Element nodes. Element objects also have a list of Attribute objects which represent the set of attributes explicitly defined as part of the element, and those defined in the DTD that have default values.

Text objects are used to represent any non-markup values, whether the values are intended to represent an integer, date, or some other type of value. For XML documents, all whitespace between markup results in Text objects being created.

The Document object is the root node of a document object tree, and represents the entire HTML or XML document. The HTMLDocument subtype represents a specialization of the generic Document type for the specific requirements of HTML documents.

Additional object classes are defined in the working draft for representing XML Document Type Definitions, and auxiliary data structures (e.g., lists of nodes).

Normally, a DOM-compliant implementation will make the main Document instance available to the application through some implementation-defined mechanism. For example, a typical implementation would give the application a reference to a DocumentContext object. This object describes the source of the document, as well as related information such as the date and time the document was last changed. From the DocumentContext, the application may access the Document object, which is the root of the document object hierarchy. From the Document object, the application can use the methods provided for accessing individual nodes, selection of specific node types (such as all images), and so on. For XML documents, the DTD is available through the documentType method (which returns null for HTML documents and XML documents without DTDs). Document also defines a getElementsByTagName method. This produces an enumerator that iterates over all Element nodes within the document whose tagName matches the input name provided. (The DOM working draft indicates that a future version of the DOM will provide a more generalized querying mechanism for nodes).

As an example generally illustrating how an XML document might be presented to an application in the DOM, consider the example described in Section 2.1.4 of a simple relational database represented in XML. The DOM for XML would present the XML document to an application as a collection (actually, a tree) of objects. Most of these objects would be of type Node, and specifically of its subtypes Element (representing the individual elements) and Text (representing the content). More precisely:

<!doctype mydata "http://www.w3.org/mydata">
<mydata>
...
</mydata>

(the outer markup) would be presented as an object of type Document (a subtype of Node). The children of this node would be objects representing the Table elements (and, indirectly, their contained rows and fields). Type Node provides a method getChildren() to access the children. The table delimited by

<authors>
...
</authors>

would be presented as an object of type Element (another subtype of Node) representing the Authors table. Type Element provides a method getTagName() to provide access to the actual tag name (authors in this case). The children of this node would be objects representing Row elements of type Author (and, indirectly, the contained fields). Similarly,

<editors>
...
</editors>

would be presented as another object of type Element representing the Editors table.

Each element delimited by

<author>
...
</author>

would be presented as an object of type Element representing a particular Author row. The children of this node would be objects representing the fields contained in the row. Elements delimited by

<editor>
...
</editor>

would similarly be presented as objects of type Element representing Editor rows.

Fields would similarly be presented as Element objects. For example, each element delimited by

<name>
...
</name>

would be presented as an object of type Element representing that particular field. Each of these elements would have a child node of type Text (Text is not a subtype of Element) representing the text value of the field (e.g., "Robert Roberts"). The data() method of the Text object type returns the actual string representation. In this case, this would end the nesting.

The representation of a Web page in terms of objects makes it easy to associate code with the various subcomponents of the page. The DOM requirements also identify the need for an event model, to provide a way to schedule the execution of the code associated with particular parts of a Web page at appropriate times. This event model (not yet specified) would extend the current event capabilities provided by most Web clients. The requirements specify that:

all elements will be capable of generating events
there will be interaction events, update events, and change events
the event model will allow responses to user interactions
events will bubble through the structural hierarchy of the document
events are synchronous
events will be defined in a platform independent and language neutral way
there will be an interface for binding to events

As noted at the beginning of Section 2.3, the development of the DOM recognizes the fact that, in enhancing the data structuring capabilities of the Web, more is needed than just more complex representations. There also must be built-in (and widely-available) capabilities for processing these representations. The DOM interface (and its implementation by clients and other tools) provides a general means for applications to access and traverse these representations without having themselves to perform complex parsing. The more complex the representation can become, the more important this capability becomes (and, hence, it is particularly important if XML is the representation). DOM's support for dynamic documents (documents mutable on the client) also causes these documents to more closely resemble the state of general objects. The integration of DOM and XML will provide a powerful basis for enriched Web applications.

The DOM remains under development, and further work is required to integrate it both with other Web technology developments, and with capabilities required to provide full Web object model support. For example, SGML's DSSSL (described briefly in the XML section) defines a very general object model for SGML documents, called groves, which resembles the DOM to some extent. Groves are intended to provide a runtime object model for use while processing SGML documents. However, it is not clear to what extent DOM and grove capabilities will be integrated. Groves are extremely general (e.g., using groves it is possible to define each character in a document as a separate element), and it is not clear that the same level of generality is required for DOM. Moreover, groves define an object model for static documents. DOM, on the other hand, is designed to deal with dynamic documents, which can be modified by processing applications (via the DOM interface) at runtime. However, the XML stylesheet proposals are based to some extent on DSSSL (and hence presumably on the use of some aspects of groves). Another interesting aspect of this integration is that DSSSL defines a query language called SDQL for accessing parts of SGML documents for use in stylesheet processing. The provision of a query language (or aspects of one) for XML would provide an important base for the development of full-fledged database-like processing capabilities for Web documents represented in XML. This issue is being explored further in a companion OBJS technical report in progress.

The DOM defines its API at a generic level, i.e., at the level of components of a document metamodel. Additional work would be required to define "application level" object interfaces. For example, in the relational database example defined above, DOM provides objects of types node, element, and so on, rather than objects of type author or editor (or even objects of type table or row). Using DOM, an application could effectively create such types from the information given, but it would have to "know what to look for", and would have to traverse the various element objects to find that information. It would be desirable to have a capability for creating DOM-like, but application-oriented, APIs. This could involve using additional metadata (e.g., the DTD, or an XML-Data-like schema) to generate a default API automatically (which the document's author could then customize). It might then be possible to attach specific methods to this API to define application-specific object behavior. An integration of DOM and the embedded OBJECT elements described below would be one way to support this. This would effectively permit the creation of objects in the classic object-oriented programming sense.

The DOM work also needs to be integrated with the work on higher-level models described in Section 2.2. One effect of this would be to provide a way to add object behavior to documents without the need for references to the associated programs to be embedded in the page, as with OBJECT elements. These models might also provide additional support for generating application-specific object APIs.

2.3.2 Embedded Objects

Web clients generally contain mechanisms for rendering common data types such as text, GIF images, colors, fonts, and some graphic elements. To render data types that do not have built-in support, clients generally run external applications (plug-ins or helpers). In addition, Web clients currently support mechanisms for including specialized types of "objects" in the rendering process that are not physically located in the document, e.g.:

the <IMG> tag is used to specify a reference to an image located in a separate file that is to be included as part of the rendering of the page
the <APPLET> tag is used to specify a reference to a Java (or other) applet that is to be executed as part of the rendering of the page

The recently-adopted HTML 4.0 Specification <http://www.w3.org/TR/REC-html40/> defines an OBJECT element (and an associated <OBJECT> tag) which subsumes these specialized tags (the <OBJECT> tag is already supported in some Web clients). In general, its purpose is to define an inserted rendering mechanism, in order to allow authors to control whether included objects are handled by Web clients internally or externally.

In the most general case, an inserted rendering mechanism specifies three types of information (although in specific cases not all this information may need to be explicitly specified):

the rendering mechanism's implementation
the data to be rendered
additional values required by the rendering mechanisms at run-time

(Not surprisingly, this is a variant of the information needed for an object invocation in an object-oriented programming language).

In HTML 4.0, the OBJECT element specifies the location of a rendering mechanism and the location of data required by the rendering mechanism. This information is specified by the attributes of the OBJECT element. The PARAM element specifies a set of run-time values.

A client interprets an OBJECT element by first trying to render the mechanism specified by the element's attribute. If this cannot be done for some reason (e.g., the client is configured not to, or the client platform cannot support that mechanism), the client must try to render the element's contents. This provides a way to specify alternate object renderings, since the contents of an OBJECT element can be another OBJECT element specifying an alternative mechanism. The contents of the most deeply embedded element should be text. Data to be rendered can be supplied either inline, or from an external resource. An HTML document can be included in another document by using an OBJECT element with the data attribute specifying the file to be included.

The following simple Java applet:

<APPLET code="AudioItem" width="15" height="15">
<PARAM name="snd" value="Hello.au|Welcome.au>
Java applet that plays a welcoming sound.
</APPLET>

may be rewritten as follows using OBJECT:

<OBJECT codetype="application/octet-stream"
        code="AudioItem"
        width="15" height="15">
<PARAM name="snd" value="Hello.au|Welcome.au">
Java applet that plays a welcoming sound.
</OBJECT>

The OBJECT element includes, among others, the following attributes:

codebase: the path used to resolve relative URLs specified by classid, specified as a URL
classid: the location of a rendering mechanism, specified as a URL
codetype: the Internet Media Type of data expected by the rendering mechanism specified by classid
data: the location of the data to be rendered, specified as a URL
type: the Internet Media Type for the data specified by data

The HTML OBJECT element illustrates an example of a capability for Web clients to automatically invoke behavior associated with a document when the behavior is encountered. The approach to a Web object model described in Section 3 must both generalize this capability, and integrate it with the XML, RDF, and DOM technologies described earlier. In particular, the OBJECT element only deals with references to external code that have been embedded in the document (i.e., the relationship between the code and the document is represented physically in the document). A generalization of this capability (and an integration of it with PICS/RDF metadata access concepts) would allow relationships between code and documents to be specified separately from the code and documents that are interrelated (just as PICS content ratings may be specified separately from the content they rate), and accessed automatically during the processing of the document. This would permit a more flexible integration of data and code to form Web objects.

The OBJECT element is also the basis of current capabilities that link Web pages into CORBA distributed object architectures. This is done by using Java applets (referenced from OBJECT elements on Web pages) which define CORBA objects, and can interact with other CORBA objects (not necessarily written in Java) via CORBA's Internet Inter-ORB Protocol (IIOP), using an ORB contained in the Web client (Netscape Communicator supports such an ORB). This is an important capability in merging Web and object technologies, particularly the object service capabilities provided by CORBA architectures. Combining this capability with the facilities of our Web object model would provide a deeper integration of Web and object technology, and an improved ability to apply object services to Web resources. This is discussed further in Section 3.

2.3.3 Web Interface Definition Language

The Web Interface Definition Language (WIDL) <http://www.w3.org/TR/NOTE-widl> is commercial technology from webMethods, Inc. (information on WIDL is made available at W3C's Web site as a service by W3C, but WIDL is not W3C technology; WIDL is also described in [KR97]). WIDL is an application of XML which allows interactions with Web servers to be defined as functional interfaces. These interfaces can be accessed by remote systems using standard Web protocols, and provides the structure necessary for generating client code in languages such as Java, C/C++, COBOL, and Visual Basic.

A central feature of WIDL is that programmatic interfaces can be defined and managed for Web resources such as:

Static documents (HTML, XML, and plain text files)
Dynamically generated documents (HTML, XML, and plain text files)
HTML forms
URL directory structures

These resources need not under the direct control of programs that require such access. WIDL definitions can be co-located with client programs, centrally managed in a client/server architecture, or referenced directly from HTML/XML documents.

WIDL definitions provide a mapping between such Web resources and applications written in conventional programming languages such as C/C++, COBOL, Visual Basic, Java, JavaScript, etc., enabling automatic and structured Web access by compatible client programs, including mainstream business applications, desktop applications, applets, Web agents, and server-side Web programs (CGI, etc.). Using WIDL, programs can request Web data and services by making local calls to functions which encapsulate standard Web access protocols and utilize WIDL definitions to provide naming services, change management, error handling, condition processing and intelligent data binding. A browser is not required to drive Web applications. WIDL requires only that target systems be Web-enabled (there are numerous commercial products which allow existing systems to be Web-enabled).

A service defined by WIDL is equivalent to a function call in standard programming languages. At the highest level, WIDL files describe the locations (URLs) of services, input parameters to be submitted (via Get or Post methods) to each service, conditions for successful processing, and output parameters to be returned by each service. In much the same way that DCE or CORBA IDL is used to generate code fragments, or 'stubs', to be included in application development projects, WIDL provides the structure necessary for generating client code in languages such as C/C++, Java, COBOL, and Visual Basic.

Many of the features of WIDL require a capability to reliably identify and extract specific data elements from Web documents. Various mechanisms for accessing elements of HTML and/or XML documents have been defined, such as the JavaScript Page Object Model, the Document Object Model, and XML-Link. The following capabilities are desirable for accessing elements of Web documents:

HTML Parsing
XML Parsing
Text Pattern Matching

Object referencing mechanisms would ideally support both parsing and pattern matching. Pattern matching extracts data based on regular expressions, and is well suited to raw text files and poorly constructed HTML documents. Parsing, on the other hand, recovers document structure and exposes relationships between document objects, enabling elements of a document to be accessed with an object model. WIDL does not define or determine a mechanism for accessing document data, but rather allows an object model referencing mechanism to be specified on a per-interface basis.

The following example (from the cited reference) illustrates the use of WIDL to define a package tracking service for generic Shipping. By allowing a WIDL definition to reference a 'Template' WIDL definition, a general class of shipping services can be defined. 'FoobarShipping' is one implementation of the 'Shipping' interface.

<WIDL NAME="genericShipping" TEMPLATE="Shipping"
      BASEURL="http://www.shipping.com" VERSION="2.0">

<SERVICE NAME="TrackPackage" METHOD="Get"
         URL="/cgi-bin/track_package"
         INPUT="TrackInput" OUTPUT="TrackOutput" />

<BINDING NAME="TrackInput" TYPE="INPUT">
   <VARIABLE NAME="TrackingNum" TYPE="String" FORMNAME="trk_num" />
   <VARIABLE NAME="DestCountry" TYPE="String" FORMNAME="dest_cntry" />
   <VARIABLE NAME="ShipDate" TYPE="String" FORMNAME="ship_date" />
</BINDING>

<BINDING NAME="TrackOutput" TYPE="OUTPUT">
   <CONDITION TYPE="Failure" REFERENCE="doc.title[0].text"
              MATCH="Warning Form" REASONREF="doc.p[0].text" />
   <CONDITION TYPE="Success" REFERENCE="doc.title[0].text"
              MATCH="Foobar Airbill:*" REASONREF="doc.p[1].value" />
   <VARIABLE NAME="disposition" TYPE="String" REFERENCE="doc.h[3].value" />
   <VARIABLE NAME="deliveredOn" TYPE="String" REFERENCE="doc.h[5].value" />
   <VARIABLE NAME="deliveredTo" TYPE="String" REFERENCE="doc.h[7].value" />
</BINDING>

</WIDL>

In this example, the values defined in the 'TrackInput' binding get passed via HTTP Get as name-value pairs to a service residing at 'http://www.shipping.com/cgi-bin/track_package'. Object References are used in the 'TrackOutput' binding to a) check for successful completion of the service, and b) extract data elements from the document returned by the HTTP request.

'Input' and 'Output' bindings specify the input and output variables of a particular service. Input bindings define the name-value pairs to be passed via Get or Post methods to a Web-based application. Output bindings use object references to identify and extract data elements from documents returned by HTTP requests.

Conditions define 'success' and 'failure' states for output bindings, and determine whether a binding attempt should be retried in the case of a 'server busy' error: Conditions can apply to a binding as a whole, or to a specific object reference. Conditions can define error messages to be returned as the value of the service; error messages can be a literal, or can be extracted from the returned document.

WIDL is another example of technology that provides an API (an object interface) to state. In addition, it supports the definition of similar interfaces to Web-based services. Facilities for defining such interfaces are helpful tools in integrating Web-based state and behavior.

2.4 Related OMG Technologies

Section 1 briefly described OMG's activities in developing an infrastructure for distributed object computing. Section 1 also noted the resemblance of the Web to a simple distributed object system. Given that commonality, practically any of OMG's work could be considered "relevant" to the creation of a Web Object Model. Information on the wide range of OMG's activities is available at the OMG Web site <http://www.omg.org/>. This activity includes both platform-related work on infrastructure components, and work related to specific vertical industry application domains. While much of this OMG activity is proceeding independently of Internet-related activities, one OMG activity which is directly addressing the integration of Internet and distributed object technology is OMG's Internet Special Interest Group <http://www.objs.com/isig/home.htm>.

While a complete description of OMG activities is outside the scope of this report, several OMG technologies address structured data representation capabilities similar to others descrbed in Section 2, and hence are of direct interest here. Specifically, the OMG has been considering a Tagged Data Facility, and a Mediated Exchange Facility based on it, as part of its Common Facilities Architecture. The Tagged Data Facility involves the use of tagged data items to support semantics-based information exchange between applications, and also supports nesting and the ability to locate objects via tags through layers of nesting. The Mediated Exchange Facility is built on the Tagged Data Facility by adding mediator components and related services. Several submissions to OMG's Business Object Facility RFP describe such capabilities. In addition, the already-approved OMG Property Service provides similar capabilities. These OMG technologies are of interest in showing that there is a recognized need for tagged "data" representations to pass semantically-rich data structures between clients and servers within OMG's distributed object architecture, just as the representations described in Section 2.1 illustrated the need to do the same thing in the Web. However, there is not yet any coordination between these two communities in developing these facilities.

2.4.1 OMG Property Service

The OMG Property Service defines PropertySet objects that act as containers for sets of properties (name/value pairs). Each property has a different name. All property values are defined (and represented) as type any. PropertySet objects provide operations for finding the value of a property given its name, adding and deleting properties, modifying the value of an existing property, and determining whether the object has a property with a given name. PropertySet objects are intended to be a dynamic equivalent of CORBA attributes. When an application finds it necessary to add an attribute to an object, and cannot do so by using the IDL interface of the object (either using an existing attribute, or modifying the interface to add a new one), it can create a PropertySet object with the necessary attribute(s) and associate it with the object. A given object may have zero or more PropertySet objects associated with it. The Property Service does not define how this association is established. It could be done, for example:

by having attributes of type PropertySet in the object interface
by having the object interface inherit from PropertySet
by using the Relationship Service to define associations between the object and PropertySet objects

PropertySet objects do not have "schemas" as such; that is, there is no declaration that restricts a PropertySet to only contain properties with specific names. Nor is there a declaration that specifies that a property with a given name must only have values of a specific type. As a result, in the general case a property with any name/value combination can be contained in a given PropertySet (and there is no guarantee that a given name won't be used inconsistently by multiple applications in different PropertySets the application might define). However, such constraints can be (at least partially) defined operationally through the PropertySetFactory object used to create PropertySet objects (by implementing the appropriate PropertySetFactories to enforce the required constraints).

The OMG Property Service essentially provides a simple, dynamic, object-oriented interface to relatively unstructured property/value pairs. Object models (including OMG's) are generally static, in that they require an object class to have a fixed number of attributes and methods. The OMG Property Service addresses this restriction, and thus adds value to the object model. It does not specify an actual representation (this would presumably be specified using object externalization capabilities currently being developed by OMG), it is not as rich as XML, nor does it provide the higher-level modeling capabilities such as those described in Section 2.2. However, in some respects it resembles a very simple DOM, in that it does provide an object interface to an (unspecified) representation.

2.4.2 Tagged Data Facility

The OMG has been considering release of an RFP (Request for Proposal) for a Tagged Data Facility (TDF). The TDF is intended to provide a facility for defining semantically-tagged objects that can be passed as parameters between ordinary CORBA objects. In particular, the TDF is intended to:

support tagged (named) data values of all types
support nesting (of data objects within other data objects), without requiring a preplanned sequence or order
allow the value of a given tag to evolve from being a single value to being nested tagged data
support methods for locating, creating, updating, deleting, etc., contained data objects (in particular, by name)
support a capability to apply name spacing and synonyms to the tag within a data object
support automatic type-conversions of data values on retrieval

A tagged data object is intended to be an object; unlike a PropertySet object, its interface is not intended to be part of another object. Moreover, TDF objects are not intended to be "network-visible" objects. They are intended to be passed by value when used as information exchange between CORBA objects.

The TDF requirements seem to fit the basic structural capabilities of OEM and MCF to some extent (the draft TDF RFP explicitly references OEM), in the sense that they seem to call for the ability to construct complex graph structures of relatively simple labeled nodes. However, MCF in particular goes much further than TDF in defining the basis of a rather complete object model (which is unnecessary in TDF since TDF objects are already CORBA objects). TDF also specifies some metadata-related requirements, such as dealing with namespace issues and synonyms. However, like the Property Service, TDF is not well-integrated with related Web developments. Of course, as an RFP, the TDF leaves a great deal of detail, both of technology and usage scenarios, to be supplied by specific technology proposals submitted in response. As a result, it may be possible that some technology integrating OMG and Web technology, e.g., combining XML and DOM, could be adopted in response to the TDF RFP, once it is issued.

3. Building a Web Object Model

Section 2 has described a number of the key technologies that address issues in creating a Web object model. In this section, we describe a general approach to integrating these technologies to support a Web object model. Specifically, the key component technologies we propose to integrate are:

XML. XML provides a richer representation for object state than HTML, including:

application-specific tagged data elements and nested structures
more powerful linking facilities

In supporting an object model, XML pages (like HTML pages) can also be used as containers for embedded objects and object methods (e.g., Java applets)

the Document Object Model (DOM). The DOM provides an API for XML documents used as object state, and provides a mechanism for integrating object state and associated code
the OBJECT element from HTML 4.0 for representing and implementing embedded object methods
concepts from the PICS, RDF, and MCF Web data/metadata models to provide standardized attributes, data structures, and infrastructure for representing and implementing basic aspects of the object model, including,

relationships between documents containing state and documents containing metadata (including type information and code implementing object methods)
the framework for accessing and invoking object methods contained in separate Web resources when documents requiring those methods are accessed
type definitions and relationships between types (e.g., inheritance), depending on the details of the object model chosen

In addition to using these emerging Web technologies, we also take advantage of other existing aspects of the Web, e.g.:

Web clients already have the capability to invoke some forms of code associated with Web pages (e.g., Java applets and plug-ins)
Web clients will soon provide support for PICS. This establishes the principle of intercepting requests for Web pages in order to perform intermediate processing on the request (in the case of PICS, checking content ratings prior to displaying the page).
Code libraries currently exist on the Web, with metadata describing them. Thus, it is not a major extension (at least in principle) to allow this code to be associated with "data pages" in order to form objects.

3.1 Integration Approach

The idea behind integrating these technologies to form a Web object model is that an "object" in a conventional object model is basically a piece of state with some attached (or associated) programs (methods). In many object model implementations, this idea is exactly reflected in the physical structure of the objects. For example, a Smalltalk object consists of a set of state variables (data), together with a pointer (link) to a class object which contains the object's methods. The structure is roughly:

    Object (state)                Class object
  +---------------+              +-------------+
  | class pointer |------------->| Class data  |
  +---------------+              +-------------+
  | variable 1    |              | method 1    |
  | variable 2    |              | method 2    |
  |   ...         |              |   ...       |
  | variable n    |              | method m    |
  +---------------+              +-------------+

C++ implementations use similar structures. The state is a collection of programming language variables, which (usually) are not visible to anything but the methods (this is referred to as encapsulation). A typical object model has a tight coupling between the methods and state. All the structures (class objects, internal representation of methods and state, etc.) are determined by the programming language implementation, and are created together as necessary. The class (in particular, the methods it defines) defines the way the state should (and will) be interpreted within the system, and hence is a form of metadata for the state. As a result, the link between an object and its class is essentially a metadata link.

Extending this idea to the Web environment, the idea is that Web pages can be considered as state, and objects can be constructed by enhancing those pages with additional metadata that allows the pages to be considered as objects in some object model. In particular, we want to enhance Web pages with metadata consisting of programs that act as object methods with respect to the "state" represented by the Web page. The resulting structure would, at a minimum, conceptually be something like:

                       +----------+
           +---------->| method 1 |
+-------+  |           +----------+
|  Web  |--+              ...
|  page |--+
+-------+  |           +----------+
           +---------->| method n |
                       +----------+

The NCITS Object Model Features Matrix [Man97] identifies many different object models, with widely differing characteristics. Different object models could also be defined for the Web. The details of the structures to be supported in a Web object model depend on the details of the object model we choose to define. For example, many object models are class-based, such as the Smalltalk and C++ models mentioned above. Choosing a class-based model for the Web would require defining separate class objects to define the various classes. Other object models are prototype-based, and do not require a class object (each object essentially defines itself). Either of these forms (plus others) could be supported by the basic mechanism we propose.

In a Web object model, some of the tight coupling that exists in programming language object models would probably be relaxed, and the connection between the state and code would be somewhat "looser". This would allow more flexibility in defining associations between programs and Web pages in the model. For example, unless special constraints prohibited such access, a user would probably be able to directly access the state (and manipulate it as well) using standard Web document viewing and creation tools, without necessarily using any associated methods (just as users today can often usefully access pages containing Java applets even when Java is inactive or unsupported on their browsers). In these cases, encapsulation would be relaxed and access to any methods related to the state would be optional.

Constructing these object model structures requires a number of "pieces" of technology, as we have already observed several times. These pieces are:

a representation for object state; this role is played by XML pages
an API to this state, so that programs can readily access it; this role is played by the DOM
pieces of code to serve as object methods; this role can be played by

OBJECT elements embedded in the state
other pieces of code defined as or within Web resources separate from the state, identified by URLs, that are designed to access the state via its DOM-based interface, and that are associated with the state via the relationship mechanism in the next bullet

a way to define the relationships between the state and the methods (the linkages in the above diagrams)
a way to access and invoke the code as necessary when the state (Web document) is accessed

Code resources are already being stored on the Web, e.g., in program libraries supporting reuse, and it is already possible to create relationships (links) between Web documents and such resources. However, in using code resources to create objects, it is necessary to not only define the links between the code and its associated state, but also to reflect the special semantics associated with these links. These semantics somewhat resemble those of metadata such as PICS content labels, in the sense that instead of the user explicitly following the links to retrieve the associated "metadata", some of the "metadata" is automatically retrieved during access to the original resource, in order to support some special processing. This processing involves a form of what is variously called a metalevel, reflective, or intermediary architecture, in the sense that the processing requires that ordinary requests for data on the Web be interrupted or intercepted, so that the necessary special processing can be performed. In the case of content labels, the special processing involves checking the content labels against user-specified requirements in order to determine whether to allow access to the original resource. In the case of object methods, the special processing involves accessing the code, and invoking that code, in order to perform some operation.

In the approach we propose, relationships between the state and the methods will be defined in either of two ways:

OBJECT elements referring to the programs can be embedded in the Web pages. This requires that the pages be created with (or modified to contain) the necessary OBJECT elements. However, this capability is available now, and hence requires no enhancements.
The programs can be identified in metadata defined as either embedded or separate RDF resources. For example, an RDF resource associated with a given Web page might contain OBJECT elements that identify the programs that act as the page's methods. Alternatively, the RDF resource might refer to programs defined as Web resources using some mechanism other than the OBJECT element, and also include a reference (possibly as an OBJECT element) to a "loader" mechanism capable of accessing those programs and providing them to the client on request. RDF resources contain explicit references to the Web pages for which they define metadata. However, they do not require that the Web pages they describe themselves be aware of the existence of this metadata, and hence do not require that the pages be created with (or modified to contain) references to the metadata. Thus, using an RDF-based approach would allow Web pages to be associated with object methods without the pages themselves having to contain references to the methods.

In order to define relationships between Web pages and methods without these relationships being explicitly contained in the Web pages, it is necessary to have a way to determine the existence of these relationships at runtime, so that the client can download those methods, and invoke them to provide object behavior. PICS provides a mechanism for doing this. PICS defines metadata (content labels) that need not be embedded in the page described by that metadata. In PICS, the client specifies the sources and types of content labels it wants to use to evaluate the Web pages it accesses. Whenever an attempt is made to access a page, content labels from those sources are implicitly accessed (either from the site supplying the page, or from a separate rating service), and evaluated to determine whether access to the page should be allowed. It seems likely that RDF will define a similar (but possibly more general) mechanism for transparently accessing metadata about a given page when the page is accessed, and providing that metadata to the Web client. This mechanism would provide the basis of our metadata access mechanism as well. (If such a mechanism is not defined in RDF, we would define one as an extension. This would probably be relatively straightforward, given the existence of the PICS mechanism already mentioned). In our case, however, the metadata will contain methods that can operate on the data in the page, and perform various functions based on that data.

The two mechanisms identified above (embedded OBJECT elements and RDF resources associated with the page) potentially provide a way to access the methods when the state is accessed. In addition, a mechanism is required to invoke the code as it is needed. The OBJECT element already provides such a mechanism which can be used in some cases (for example, this is used to invoke Java applets embedded in pages). A more general mechanism would necessary for methods defined in RDF resources. There may be a way to do this provided within a general RDF-supported metadata access mechanism (this is currently not clear, since RDF is still under development). Alternatively, it may be necessary to define this as an extension. Again, this would probably be relatively straightforward.

Many details of this technology integration must still be worked out (partially because some of the key technologies we have identified are still under development). Nevertheless, we feel that the capabilities inherent in these technologies provide the necessary support for the object model integration we propose.

3.2 Discussion

A number of projects have investigated developing object capabilities for the Web, e.g., the Harvest Object System [CHHM+94], W3Objects [ILCS95], and ANSAWeb [REMB+95]. A thorough review of such projects has been undertaken, and the descriptions of W3Objects and ANSAWeb below are taken from a forthcoming technical report "Web + Object Integration", by Gil Hansen (OBJS), resulting from that review.

The Harvest Object System (HOS) [CHHM+94] modified the Mosaic browser to include a Harvest Object Broker, allowing users to interact with remote objects via a special Harvest Object Protocol (HOP). HOS defines objects from existing files and programs by recording metadata roughly of the form:

     user-defined type name
          URL --> file data
          URL --> method (program)
          URL --> method
          URL --> method
          ...
          URL --> method

using SOIF to hold that metadata. The HOP is used for retrieving IDL information, moving object code and data, and invoking objects. A command such as GETOBJS hop://URL/some.obj (where URL/some.obj designates a file) returns the object data for some.obj along with its metadata, including a set of methods.

ANSAWeb <http://www.ansa.co.uk/ANSA/ISF/overview.html> provides a strategy for interoperability between the Web and CORBA using HTTP-IIOP gateways -- the I2H gateway converts IIOP requests to HTTP, and H2I converts HTTP requests to IIOP. The H2I gateway allows WWW clients to access CORBA services; the I2H gateway allows CORBA clients to access Web resources. The pair of gateways together behave like an HTTP proxy to the client and server. A CORBA IDL mapping of HTTP represents HTTP operations as methods and headers as parameters. An IDL compiler generates client stubs and server skeletons for the gateways. H2I is both a gateway to IIOP and a full HTTP proxy so a client can access resources from a server that does not have an I2H gateway. A locator service decides when to use IIOP or HTTP. If the locator can find an interface reference to a I2H server-side gateway, IIOP is used; otherwise, the H2I gateway passes the request via HTTP.

The W3Objects <http://arjuna.ncl.ac.uk/w3objects/> project at the University of NewCastle upon Tyne provides facilities for transforming standard Web resources (HTML documents, GIF images, PostScript files, audio files, and the like) from file-based resources into objects called W3Objects, i.e., encapsulated resources possessing internal state and well-defined behaviors. The motivating notion is that the current Web can be viewed as an object-based system with a single class of object -- all objects are accessed via an HTTP daemon. W3Objects are responsible for managing their own security, persistence, and concurrency control. These common capabilities are made available to derived application classes from system base classes. A W3Objects server supports multiple protocols by which client objects can access server objects. When using HTTP, the URL binds to the server object and the permitted object operations are defined by the HTTP protocol. Or, the RPC protocol can be used to pass operation invocations to a client-stub generated from a description of the server object interface. W3Objects uses C++ as the interface definition language, although CORBA IDL and ILU ISL can be used. W3Objects can also be accessed though a gateway, implemented as a plug-in module for an extensible Web server, such as Apache <http://www.apache.org/>. URLs beginning with /w3o/ are passed by the server to the gateway; the remainder of the URL identifies the requested service and its parameters. Using a Name Server, the appropriate HTTP method is invoked on the requested service.

These projects have identified a number of important ideas in supporting objects on the Web (in particular, objects constructed in the HOS resemble in many respects those that would be constructed using the approach described in Section 3.1). However, they based their attempts to develop object capabilities for the Web on the existing Web infrastructure. As a result, they had to use a number of non-standard Web extensions (e.g., special protocols referenced in URLs to trigger the loading of object methods), which limit their widespread usability. Dependence on the existing Web infrastructure also limits the ability of the resulting objects to support more complex Web applications. Our work, on the other hand, is based on what will likely be the next-generation Web infrastructure. This infrastructure is still evolving, and hence some extensions to it may yet be necessary. However, based on our analysis, these new Web technologies seem likely to provide a much better basis for providing powerful Web object facilities, that are at the same time based on standard (hence, widely accessible) Web protocols and components.

An approach similar to that provided by ANSAWeb is becoming increasingly popular, and is potentially very powerful. This involves placing Java applets on Web pages (using the APPLET or OBJECT elements in HTML). Once on the Web client, these objects then communicate with other objects on remote servers using various protocols. A particularly important variant of this approach is to use it to combine Java and CORBA. In this variant, Java applets downloaded to the client communicate with other CORBA objects over the Internet via CORBA's IIOP (Internet Inter-ORB Protocol), which is supported by all CORBA Object Request Brokers. This approach is, for example, supported by Netscape Communicator, which includes Visigenic's Java ORB. Using this approach, the advantages of CORBA's object services are potentially available to Internet objects. This also allows non-Java objects to be integrated into the Internet, since CORBA objects can be written in many languages. Java has also been the basis of proposals to improve Web capabilities by representing more and more Web content directly as Java objects, using the existing Web largely as a transport mechanism for these objects.

Such approaches provide important new mechanisms for supporting more powerful Web capabilities, and integrating enterprise distributed object systems (which are likely to be CORBA-based) with the Internet. However, these approaches suffer from a number of disadvantages when used by themselves, e.g.:

much of the Web remains outside the scope of CORBA-based services unless additional facilities are provided to wrap Web resources as objects
the current easy construction of Web content is replaced by the need to use Java objects,
there is no smooth integration of Web page content with Java
Java programs must still grapple with the problem of processing syntactically tagged HTML content

What we are proposing is a general way to merge objects and the Web. Our approach subsumes these Java-based approaches, since all these mechanisms for integrating Java (and CORBA) objects with Web pages are still available. However, our approach goes beyond these approaches in providing richer Web content that is more amenable to application processing (XML pages accessible via DOM), together with a more general way to link non-embedded methods with that Web content.

There are a number of potential ways to use the "objects" constructed using the mechanism we are proposing. One approach would be to use the methods associated with a document in the same way that Java applets are used now. The difference would be that the code would not need to be embedded in the document. (In fact, depending on the exact details of the DOM, if the methods were separately-located OBJECT elements, they could presumably be embedded dynamically in the document at the client using the DOM interface, and act just the way embedded OBJECTs would act). A more conventional "object-like" use would be to allow the associated methods to be invoked via an enhanced DOM interface by programs acting through the client. That is, the DOM effectively implements a generic interface of a type something like XML-document (for XML documents). Application-specific subtypes of this generic type could be created which included the application-specific methods associated with the document as parts of the interfaces defined for those subtypes. Programs acting through the client could then invoke these methods through the new interfaces just as they invoke the methods of other objects.

The mechanism defined here provides a form of "component-oriented" development, in that it allows the arbitrary composition of objects from data and code resources found on the Internet. Using this approach, a client could have multiple "object views" of the same base data (e.g., access the same data resources using different classes), by simply changing the collection of methods it uses when accessing the data (this would be like using different annotation sets or PICS-like labels in accessing a document).

The approach may appear somewhat "heavyweight", in the sense that it involves additional mechanism, and may involve delays in accessing the code that implements object methods. However:

if object methods are implemented as embedded OBJECT elements, there is no change from the current way that Java applets are downloaded with pages
if object methods are implemented as separate resources but are co-located with the documents they operate on (at the same server), they can potentially be downloaded with the document (within the same response by the server) using PICS/RDF-defined mechanisms
using the additional methods is (at least potentially) a choice that the user can make
it is important to provide the basic capability for doing this, at which point the efficiency issues can be addressed. For example, these could be addressed by caching frequently used methods at the client, or by other mechanisms.

In this connection, it is useful to compare the architecture that results from using this approach to that of an Object DBMS (ODBMS). In most current ODBMS client/server architectures, methods typically reside in class libraries located on the client, rather than being stored as complete objects on the server. Only object state resides on the server. When objects are needed by the client, the state is accessed from the server, moved to the client, and complete objects are created locally using the client-based class libraries. In our approach, both the methods and the state (at least conceptually) reside remotely; the client only contains references to the objects. The Web delivers the state to the client just the way an ODBMS server does, and delivers the methods as well.

So far, our work has focused on identifying new Web technologies to serve as a base, analyzing their capabilities, and developing the basic principles for integrating them. Further work needs to be done to work out the additional details required to build a prototype implementation. For example, we have already noted that there are many object models that could be supported using the principles we have identified. It will be necessary to choose a particular object model (or possibly more than one) to use for our Web object model. This, in turn, will affect the structure of the metadata that must be supported. For example, if a class-based model is chosen, additional metadata will need to be defined to support the class objects (these could be recorded as Web objects too, using RDF, possibly together with techniques from MCF or XML-Data). Further work will be necessary to determine an appropriate type of object model for use on the Web.

Additional work is also required to define the mechanism that invokes the object methods once they are returned to the client. This will depend on the details of how the RDF standard evolves. As noted at the end of Section 3.1, the general RDF-supported metadata access mechanism may provide a way to insert this method invocation mechanism. Alternatively, it may be necessary to define this as an extension to the RDF mechanism.

Finally, as noted already, the DOM currently defines its API at a generic level, i.e., at the level of components of a document metamodel. Additional work is required to define "application level" object interfaces which include interfaces to the methods associated with the objects. For example, in the relational database example described in Section 2.3.1, DOM provides objects of types node, element, and so on, rather than objects of type author or editor (or even objects of type table or row). Using DOM, an application could effectively create such interfaces from the information given, but it would have to "know what to look for", and would have to traverse the various element objects to find that information. It would be desirable to have a capability for creating DOM-like, but application-oriented, APIs. This could involve using additional metadata (e.g., the DTD, or an XML-Data-like schema) to generate a default API automatically (it might then be possible for the document's author to customize this API or, alternatively, define the API explicitly). It might then be possible to attach specific methods to this API to define application-specific object behavior. An integration of DOM and embedded OBJECT elements would be one way to support this. This would effectively permit the creation of objects in the classic object-oriented programming sense.

3.3 Formal Principles

The approach to creating a Web object model described in the previous sections provides the basis for creating genuine objects, having both state and behavior, on the Web. This would greatly increase the structuring power of the Web, enabling it to support increasingly complex applications. However, as noted in Section 1, it is also important to have higher level object services available for these objects, such as those provided for CORBA objects in OMG's Object Management Architecture. In providing this additional support, it is important to have a formal foundation for the object model and its operations. For example, such a formal foundation is essential as a basis for defining query processing and view facilities (just as the formal foundation of the relational database model is essential for defining query processing and view facilities for relational databases). A formal foundation is also helpful as a basis for defining extensions to the model, and generally understanding its capabilities.

In this section, we describe some basic ideas behind work on a formal definition for our Web object model. The ideas are derived from work on the foundations of Web metadata concepts, work on object-oriented logics, and our own prior work on object model formalization. Many of these same ideas are currently being reflected in W3C's ongoing RDF activity.

3.3.1 Logic Basis

Section 2 described a number of different representation techniques and models for Web-related data. While these models have individual variations, in most cases these models are basically the same model: graphs, with labeled edges (although some models are based on a tree structure, they generally provide graph capabilities through the use of pointers of one form or another, usually URLs). This is essentially a model of the Web itself: Web resources, identified by URLs, which point to each other by including the URLs of related resources as hyperlinks. Papers describing these models often acknowledge their similarity to each other.

Common features of these representational models are:

the basic "objects" consist of either individual fields (attribute/value pairs, tagged values), or simple aggregates of these fields (e.g., sets of fields, nested fields)
support for some form of identity (such as URLs, or specifically generated object identifiers or identifier fields); fields can have as values either individual identifiers, or sets of identifiers; this allows tree and graph structures to be defined
no encapsulation; applications accessing the objects obtain direct access to the field values
no behavior in the form of object methods
loose typing (although generally a set of primitive base types is defined for field values, e.g., integer, character, etc.)--there is often no schema; in some models an arbitrarily-defined field can appear in any object, and can appear an arbitrary number of times (or not at all); sometimes the value of a field may be an actual value in one instance, and a reference to a complex object in another (e.g., an address field may be a string in one object, and a reference to a collection of street, city, and state fields in another). (Languages for querying these models must thus support pattern-matching and various forms of implicit path traversal to deal with these types of irregularities.)
no inheritance

There are a number of reasons for adopting this form of model to deal with Web data:

An approach based on attribute/value pairs, unlike a database-like "typed record" approach, is arbitrarily extensible in a federated environment (without a centralized collection of types or schema). Anyone can record any attributes they feel are necessary, without going through the "overhead" of defining a new type (and, in particular, possibly having to define it as a subtype of an existing type), and distributing that type definition throughout a distributed network. Also, anyone can appropriately use those attributes, provided that they understand the intended semantics of the attributes.
The basic structure of the Web is built up from individual tagged items (currently, in HTML), i.e., it identifies individual things down to what is essentially the attribute level. As a result, an "object model" sufficient to describe Web information must contain individual identifiable constructs down to this level (with higher-level groupings, like pages, being built up as aggregates of these primitive units).
It is still possible to combine the representation flexibility of these models with the efficiency and other benefits of typed models by using the "less typeful" model as a base, and adding type-like structuring as additional constraints. This is consistent with the idea of "retrofitting" type and other information to resources that are "discovered" dynamically. In this approach, attribute names are explicitly defined in the base model; then, "smarter components" (e.g., knowledge-based "mediators" in OEM) add more complex structures or semantics, using the attribute names to identify the relevant material.
Describing a resource by a set of attribute/value pairs is equivalent to describing the resource by a set of assertions (essentially binary relationships) in predicate logic. The resource is assigned an identity (say, a URL), and the attribute name serves as the name of the predicate/relationship, as in author(url1,"Oscar Wilde") or title(url1, "The Importance of Being Earnest"). This basis in predicate logic, like the logical basis of the relational data model, provides a solid foundation for these models, guaranteeing the generality of the model, and that familiar formal mechanisms can be employed in connection with it.

The relationship identified in the last bullet between these representational models and logic-based formalisms is very important, and is explicitly called out in a number of papers introducing or analyzing these models. The relationship is, as noted above, important in establishing a formal framework in which to understand these models, as well as in suggesting possible extensions. The relationship is also important in establishing a way to add more "intelligence", through the use of knowledge-based components such as "mediators" or "intelligent agents". Such components, for example, will need to have a formal way of interpreting the data they will be dealing with. The ability to understand Web representations in terms of logic provides a basis for applying KIF-based technologies, for example. In addition, the fact that these models can be understood in a common way (have a common semantics expressible in terms of logic) is important in providing a basis for defining translations/conversions (in terms of logic-based rules) between apparently different representations. This is similar to the use of logic-based formalisms to define translations in federated database systems (see, e.g., [FR97]).

As an example of work within the W3C addressing the relationship of logic and metadata, Describing and Linking Web Resources is an early W3C note which discusses general ideas and issues for describing and linking Web resources. It references work such as PICS, SOIF, and MCF, and notes that, though these different formats exhibit a range of syntactic variations, semantically they attempt to convey similar information. The architectural model that is common to them is the basic structure of the web: a directed graph with labeled arcs. The nodes (or points, or vertices) of the graph are URLs--anchor or resource addresses. The arcs are links. The labels are link relationships. Associated with each node is a set of attributes, or slots, or fields. Each attribute has a name and a value. Values are defined in a media-type specific manner.

The note also identifies the relationship of these attribute/value-based schemes to basic concepts in propositional logic. This allows the identification of the basic principles of the model independently of particular representations. R(S, T) can be used to denote a link from S to T with relationship R. The same notation can be used for attributes, writing N(S, V) for an attribute named N on an anchor at S with value V. For example, both the SOIF description

@FILE {"http://www.shoes.com"
        Author{4}: Fred
        Supersedes{30}: http://www.provider.com/shoes }

and the HTML

<about href="http://www.shoes.com">
       <meta name=author content="Fred">
       <link rel=Supersedes href="http://www.provider.com/shoes">
</about>

can be interpreted as:

Author(http://www.shoes.com, "Fred")
Supersedes(http://www.shoes.com, http://www.provider.com/shoes)

Link semantics can be modeled by observing that anything can be considered a point in the web--including people, organizations, dates, and subject categories--by giving it a URL. A link or attribute in the web can be interpreted as an assertion, given an understanding of the semantics of the link relationship or attribute name. For example, given the definitions:

Author(S, V) means "The Author of S is V"
Supersedes(S, T) means "S supersedes T"

the HTML or SOIF data above can be interpreted as the assertions:

The Author of http://www.shoes.com is Fred.
http://www.shoes.com supersedes http://www.provider.com/shoes.

A straightforward application of this approach permits the description of a set of assertions about an individual concept, identified by an identifier. Tim Berners-Lee's paper Metadata Architecture [Ber97] carries these ideas further, and this approach is being reflected in the W3C's RDF specifications.

In addition to the description of simple, flat, sets of attribute/value pairs describing individual entities, it is necessary for these structural models to be able to handle more complex structures, such as trees (e.g., repeating groups) and networks (directed graphs). In defining these more complex structures, the ability to assign identifiers to both resources, and individual (or groups of) attribute/value pairs is important. This allows a given (sub)structure to be assigned an identity, and then referenced from multiple places within a data structure. In actual representations, such substructures are indicated not by assigning them separate identifiers, but by some distinct representation technique (e.g., by nesting them within a larger tag). Such substructures need to be understood as being "flattened", with separate identifiers defined, in interpreting them within a logic-based framework (just as, in the relational data model, data must at least be represented in unnested "first normal form"). Techniques for factoring nested parts of a hierarchical structure into a "flat" logical form, and the need for both AND and OR logical operators, are illustrated and discussed in On Information Factoring in Dublin Metadata Records <http://www.uic.edu/~cmsmcq/tech/metadata.factoring.html>.

Various specific representation techniques for metadata, such RDF, MCF, SOIF, OEM, etc., can be understood in the context of these observations as simply involving different encodings of the basic logic-based structures. Each encoding selects specific attributes, identifiers, etc. to cluster together in specific data representations, and selects others to represent as separate entities. Also, they select some relationships to represent explicitly by using identifiers as pointers, and some to represent implicitly by grouping related constructs in the same data structure. This interpretation of attribute/value pairs (and associated structures) as logical assertions is a key element in the development of a formal basis for our Web object model, and is explicitly reflected in RDF as well.

3.3.2 Representation of Higher Level Semantics

What is metadata to one application is often data to another, and vice-versa. Hence, it is often important to be able to define metadata which describes other metadata descriptions, or parts of them. For example, it is important to be able to define the semantics of the individual attributes used in metadata descriptions, and to define the characteristics of the values that may be assigned to them (e.g., their types, their units, what they signify). Discussions of structural or "lightweight" models often refer to tagged values as "self-describing", as allowing arbitrary attribute names to be introduced, and as not requiring the use of centralized attribute or type registration. However, this is only true to a certain extent. These representations are really "self-describing" in a truly useful way only if there is a common understanding of the meaning of the attribute names (and their associated values) by accessing applications. To support general interoperability, the definitions of attribute names and types must either be actually distributed, or distributed access must be provided to them.

A number of abstract models for Web metadata describe the ability to link metadata individually to tagged items (attributes). For example, the Dublin Core describes the ability to access the definition of an individual attribute. This, for example, allows the attributes used in a particular description to be linked to an ontology that defines the attributes, and the set of concepts used in the context that the attributes are intended to describe. (A resource pointing to its ontology is similar to an object pointing to its methods, in a sense: it provides an interpretation (the methods are a "procedural specification" of the meaning/behavior appropriate to the data, while an ontology is human-readable). Work by groups such as the Stanford knowledge group is intended to merge these ideas and make the ontology readable/usable by knowledge-based software, the idea being that one could have a logic-based or other semantic specification which is declarative, and machine-interpretable.) The relationship between attribute/value pairs and formal logic described above also provides a basis for representing these additional kinds of links.

Describing and Linking Web Resources discusses how higher level information (such as beliefs), and information about the attributes or relationships themselves, can also be encoded using predicate logic. The basic approach is to assign each relationship (or attribute) its own URL (object identity), thus reifying the relationship (or attribute). Once a relationship has a URL (or other unique identifier), it can have its own metadata, by recording additional assertions about that identifier. If the relationship is identified with a URL, dereferencing the URL should access a definition of the link relationship, in either human-readable or machine-readable form. In addition, information about the association between a given attribute or assertion and a given resource can also be recorded. For example, in addition to recording an assertion like cost(o1, $26.95), information as to who made that assertion, and when, can also be recorded, e.g.:

who( (o1,cost), "fred")
when( (o1,cost), "04/07/97")

In this case, (o1,cost) acts as a new unique identifier which is the identity of the use within (or for) o1 of the attribute "cost" (this is a form of identifier construction mechanism supported by object logics, such as F-logic, described below).

Metadata Architecture [Ber97] observes that the URL space is an appropriate space for the definition of attribute names in the Web because it effectively provides for a federated name space, within which users can freely define attribute names without necessarily "registering" them with a central authority. However, the URLs that identify relationships or attributes need not necessarily be used locally (within a given resource). Instead, local names from a namespace defined by the resource can be used as abbreviations. However, it should always be possible to translate from a local name to the global URL that represents the actual definition of the relationship or attribute. Relationships such as the following could be defined to represent these concepts:

global(S, T)--The anchor S, which represents a link relationship locally to a resource, is defined globally at T.
implies(S, T)--S implies T; that is, from any link/assertion S(X, Y), deduce T(X, Y) [this could be used as the basis for defining a subtype relationship for the base level relationships.]

These ideas are being reflected in the RDF, XML, and other W3C specifications. Such reification of attributes and relationships (and also of types and methods) is also a key element in the development of a formal basis for our Web object model.

3.3.3 Object Logics

Along with the development of object technology, a number of attempts have been made to extend logical formalisms to represent the specific characteristics of objects. A particular goal in the development of object logics has been to provide the same type of solid theoretical foundation for object-oriented database systems that the relational model provides for relational database systems. The foundation of the relational model (specifically relational calculus) is a restricted subset of conventional predicate logic. The reasoning was thus that, in order to have the same sort of theoretical foundations for object-oriented database systems, it would be necessary to have a logic analogous to predicate calculus, but one that would incorporate object concepts such as objects, classes, methods, inheritance, etc. A number of object logics have been introduced, one of the more thoroughly-developed of which is F-logic (Frame Logic) [KL89, KLW95].

A full exposition of F-logic is outside the scope of this paper (and in any case can be obtained from the cited references). However, F-logic includes a number of capabilities that are relevant to this discussion. For example, F-logic supports operations on both flat data structures (along the lines of the conventional relational model) and nested data structures (path traversal). F-logic also supports id-terms representing object identities. These are logical terms which which use object constructor functions that can be interpreted as constructing object identities that are functionally dependent on their arguments. These terms are used to represent derived objects (e.g., objects to be constructed on the left-hand sides of rules), with the arguments of the function indicating the base objects from which the new objects were derived (effectively, the derived identity can be considered as the labeled tuple of the base identities). The ability to construct derived objects is crucial in describing the semantics of queries which produce new objects from existing ones (as a relational join operation does) and of views.

Finally, F-logic introduces higher-order capabilities, in order to effectively describe inheritance, and operations on metadata (e.g., database schemas), while retaining first-order semantics. This is done, as suggested in the previous section, by reifying concepts such as predicates, functions, and atomic formulas, allowing them to be manipulated as first-class objects. This reification allows the use of higher-order syntax, while retaining first order semantics. Under first-order semantics, predicates and functions have associated objects, called intensions, which can be manipulated directly. Depending on the context in which they appear, these intensions may assume different roles, acting as relations, functions, or propositions. For example, in F-logic, id-terms are handled as individuals when they occur as object identities, viewed as functions when they appear as object labels (attributes), and as sets when representing classes of objects. When functions or predicates are treated as objects, they are manipulated as terms through their intensions; when being applied to arguments, they are evaluated as functions or relations through their extensions.

The use of F-logic concepts in helping define query language concepts for object-oriented databases is described in [KKS92], including query language support for:

object method invocation
derived objects (for constructing views)
querying both the database and its metadata

In addition, the higher-order capabilities of F-logic are those needed to formally define the use of mixtures of data and metadata within the Web. For example, in dealing with an RDF description of a Web resource, in some cases we may want to treat one of the RDF properties as simply a property of the described resource. In other cases, we may want to treat the property as an object in its own right (by following its URL), with properties of its own (e.g., its definition, or the ontology it is a part of). RDF explicitly allows this, using the sort of reification we have already described. Using F-logic (or possibly a variant), we hope to provide a formal basis for describing such operations, and for the development of both our Web object model, and query languages and other services based on it.

4. Conclusions

In this paper, we have:

described key examples of existing work from the Web, database, and OMG communities that contribute both ideas and technology toward providing the components of a Web object model
identified some key underlying principles behind this work
identified a framework which allows this work to be unified and extended to support the requirements of advanced Web applications for object technology

At the moment, we have only identified an approach toward integrating these technologies. Many details of this technology integration must still be worked out (partially because some of the key technologies we have identified are still under development). Nevertheless, we feel that the capabilities inherent in these technologies provide the necessary support for the object model integration we propose.

We feel that a particularly important aspect of this work is the attempt to rely to the greatest possible extent on standards (commonly-accepted or likely-to-be-accepted Web technology) in developing our integration approach, and on working within standards-developing organizations such as W3C and OMG in further refining it and developing additional capabilities. This both takes maximum advantage of existing work, and improves the chances that the technology that is developed will become widely available (albeit possibly in some modified form) in commercial software products.

Further work on this project will include:

tracking the development of the key technologies (XML, RDF, and DOM)
defining one or more specific object models as the basis of further development (an obvious candidate would be one that maps easily to OMG IDL or Java)
developing a detailed integration plan
working with the relevant standards groups to define any necessary enhancements to support object model requirements
prototype development, using tools that are already available for some of these technologies (e.g., XML and DOM)
defining query support facilities for objects created using the object model(s) we define

References

[AQMW+96] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener, "The Lorel Query Language for Semistructured Data", http://www-db.stanford.edu/pub/papers/lorel96.ps. See also the other papers available at the Stanford DB group Publications page <http://www-db.stanford.edu/pub/>.

[BBBC+97] R. Bayardo, Jr., W. Bohrer, R. Brice, A. Cichocki, J. Fowler, A. Helal, V. Kashyap, T. Ksiezyk, G. Martin, M. Nodine, M. Rashid, M. Rusinkiewicz, R. Shea, C. Unnikrishnan, A. Unruh, and D. Woelk, "InfoSleuth: Agent-Based Semantic Integration of Information in Open and Dynamic Environments", Proc. 1997 ACM SIGMOD Conf., SIGMOD Record, 26(2), June 1997.

[BDHS96] P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu, "A Query Language and Optimization Technique for Unstructured Data", Proc. SIGMOD'96, 505-516.

[BDFS97] P. Buneman, S. Davidson, M. Fernandez, and D. Suciu, "Adding Structure to Unstructured Data", Proc. ICDT, 1997.

[Ber97] T. Berners-Lee, Metadata Architecture, <http://www.w3.org/DesignIssues/Metadata>.

[Bor95] A. Borgida, "Description Logics in Data Management", IEEE Trans. on Knowledge and Data Engineering, 7(5), October 1995, 671-682.

[Bos97] J. Bosak, XML, Java, and the Future of the Web, <http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm>, 1997.

[CHHM+94] B. Chhabra, D. Hardy, A. Hundhausen, D. Merkel, J. Noble, M. Schwartz, "Integrating Complex Data Access Methods into the Mosaic/WWW Environment", Proc. Second Intl. World Wide Web Conf., Oct. 1994, 909-919.

[CM93] S. Chiba and T. Masuda, "Designing an Extensible Distributed Language with a Meta-Level Architecture", Proc. ECOOP '93, LNCS 707, Springer-Verlag, July 1993, 482-501.

[DeR97] S. DeRose, The SGML FAQ Book, Kluwer, 1997.

[FR97] G. Fahl and T. Risch, "Query Processing over Object Views of Relational Data", VLDB Journal 6(1997) 4, 261-281.

[GB97] R. Guha and T. Bray, Meta Content Framework Using XML, <http://www.w3.org/TR/NOTE-MCF-XML/>, June 6, 1997.

[GW97] R. Goldman and J. Widom, "DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases", Technical Report, Stanford University, 1997, http://www-db.stanford.edu/pub/papers/dataguide.ps.

[Hop97] A. Hopmann, et. al., Web Collections using XML, 1997 <http://www.w3.org/TR/NOTE-XMLsubmit.html >.

[IK96} T. Isakowitz and R. J. Kauffman, "Supporting Search for Reusable Software Objects", IEEE Trans. Software Engrg. 22(6), June 1996, 407-423.

[ILCS95] D. Ingham, M. Little, S. Caughey, S. Shrivastava, "W3Objects: Bringing Object-Oriented Technology to the Web", Proc. Fourth Intl. World Wide Web Conf., World Wide Web Journal, December, 1995, 89-105.

[ISO86] International Standard ISO 8879:1986(E), Information Processsing - Text and Office Systems - Standard Generalized Markup Language (SGML), International Organization for Standardization, 1986.

[ISO92] International Standard ISO/IEC 10744:1992, Information Technology - Hypermedia/Time-based Structuring Language (HyTime), International Organization for Standardization, 1992.

[ISO96] International Standard ISO/IEC 10179:1996(E), Information Technology - Processing languages - Document Style Semantics and Specification Language (DSSSL), International Organization for Standardization, 1996.

[KKS92] M. Kifer, W. Kim, and Y. Sagiv, "Querying Object-Oriented Databases", Proc. ACM SIGMOD Conf., 1992, 393-402.

[KL89] M. Kifer and G. Lausen, "F-Logic": A Higher-Order Language for Reasoning about Object, Inheritance, and Scheme", Proc. 1989 ACM-SIGMOD Intl. Conf. on Management of Data, 1989. See also other papers on F-logic and related formalisms <http://www.cs.sunysb.edu/~kifer/dood/>.

[KLW95] M. Kifer, G. Lausen, and J. Wu, "Logical Foundations of Object-Oriented and Frame-Based Languages", Journal of the ACM, July 1995, 741-843.

[KR97] R. Khare and A. Rifkin, "XML: A Door to Automated Web Applications", IEEE Internet Computing, 1(4), July-August 1997, 78-87.

[Man93] F. Manola, "MetaObject Protocol Concepts for a 'RISC' Object Model", TR-0244-12-93-165, GTE Laboratories Incorporated, 1993 <ftp.gte.com, directory pub/dom>.

[Man97] F. Manola (ed.), "NICTS Technical Committee H7 Object Model Features Matrix", X3H7-93-007v12b, May 25, 1997, http://www.objs.com/x3h7/h7home.htm.

[MGHH+97] F. Manola, D. Georgakopoulos, S. Heiler, B. Hurwitz, G. Mitchell, F. Nayeri, "Supporting Cooperation in Enterprise-Scale Distributed Object Systems", in M. Papzoglou and G. Schlageter, eds., Cooperative Information Systems, Academic Press, 1997.

[NUWC97] S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe, "Representative Objects: Concise Representations of Semistructured Hierarchical Data", in Proc. Thirteenth Intl. Conf. on Data Engineering, Birmingham, U.K., April 1997.

[OMG95] Object Management Group, The Common Object Request Broker: Architecture and Specification, Revision 2, July, 1995.

[OMG97] Object Management Group, A Discussion of the Object Management Architecture, June, 1997, http://www.omg.org/library/omaindx.htm.

[PGW95] Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, "Object Exchange Across Heterogeneous Information Sources", IEEE Intl. Conf. on Data Engineering, 251-260, Taipei, March 1995. See also the other papers available at the TSIMMIS Publications page <http://www-db.stanford.edu/tsimmis/publications.html>.

[REMB+95] O. Rees, N. Edwards, M. Madsen, M. Beasley, A. McClenaghan, "A Web of Distributed Objects", Proc. Fourth Intl. World Wide Web Conf., World Wide Web Journal, December, 1995, 75-87.

[SG95] N. Singh and M. Gisi, "Coordinating Distributed Objects with Declarative Interfaces", http://logic.stanford.edu/sharing/papers/oopsla.ps.

[SW96] R. Stroud and Z. Wu, "Using Metaobject Protocols to Satisfy Non-Functional Requirements", in C. Zimmermann (ed.), Advances in Object-Oriented Metalevel Architectures and Reflection, CRC Press, Boca Raton, 1996, 31-52.

This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government.

© Copyright 1997, 1998 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this survey.

This page was written by Frank Manola. Send questions and comments about it to fmanola@objs.com.

Last updated: 2/10/98 fam

Towards a Web Object Model

Frank Manola Object Services and Consulting, Inc. (OBJS) fmanola@objs.com 10 February 1998

Abstract

Contents

1. Introduction

1.1 Background

1.2 Capabilities Provided by an Object Service Architecture

1.3 Increasing the Structuring Power of the Web

2. Relevant Work

Caveats

2.1 Structured Data Representations and "Lightweight Object Models"

2.1.1 Summary Object Exchange Format (SOIF)

2.1.2 Object Exchange Model (OEM)

2.1.3 Knowledge Interchange Format (KIF)

2.1.4 Extensible Markup Language (XML)

2.2 Higher-Level Models and Metadata

2.2.1 Dublin Core

2.2.2 Warwick Framework

2.2.3 PICS and PICS-NG

2.2.4 XML-Data

2.2.5 Meta Content Framework (MCF)

2.2.6 Resource Description Framework (RDF)

2.3 Adding Behavior to Web Pages

2.3.1 Document Object Model (DOM)

2.3.2 Embedded Objects

2.3.3 Web Interface Definition Language

2.4 Related OMG Technologies

2.4.1 OMG Property Service

2.4.2 Tagged Data Facility

3. Building a Web Object Model

3.1 Integration Approach

3.2 Discussion

3.3 Formal Principles

3.3.1 Logic Basis

3.3.2 Representation of Higher Level Semantics

3.3.3 Object Logics

4. Conclusions

References

Frank Manola
Object Services and Consulting, Inc. (OBJS)
fmanola@objs.com
10 February 1998