The paper was originally presented at WISE2000. We transformed and revised the original for Web readers.
The paper has been published by IEEE Press. I wish you to refer the original below.
K. Kuramitsu and K. Sakamura. Distributed Object-Oriented Schema for XML-based Electronic Catalog Sharing Semantics among Businesses. In Proceedings of the first International Conference on Web Information Systems Engineering, pages 81-90, June 2000.
The Internet commerce increases the demands of service integrations by sharing XML-based catalogs. We propose the PCO data model supporting semantic inheritance to ensure the synonymy of heterogeneous semantics among distributed schemas that different authors define independently. Also, the PCO model makes semantic relationships independent of an initial class hierarchy, and enables rapid schema evolution across the Internet business. This preserves semantic interoperability without changing predefined classes. We have also encoded the PCO model into two XML-base languages: PCO Specification Language (PSL) and Portable Composite Language (PCL). This paper demonstrates that intermediaries defining service semantics in PSL can automatically integrate multi-suppliers' PCL catalogs for their agent-mediated services.
The HTML catalogs have been exploring electronic commerce, which make an appeal to web consumers by introducing beautified product information. [2] However, the lack of semantics in HTML makes it difficult for businesses to integrate Internet services across websites. More recently, XML, or semantic web, have become intensively anticipated, as a new modeling language for application integrations. [10,11,12,13,20]
The XML can define semantic specifications newly by using DTD (Document Type Definition), or the emerging XML Schema standard. However, an interoperability issue still remains with XML semantics.[3] For example, when different merchants define original DTDs/schemas for their catalogs independently, third parties cannot understand semantic relationships among them. For Internet mediators, schema interoperability, or automatically semantic sharing, is required to establish the integrated application of XML catalogs across websites. The purpose of this paper is to propose a new schema design model on XML, which enables distributed schemas to share semantics across the Internet.
We have designed a new commerce modeling language, called PCO (Portable Compound Object). [9] The contribution of the PCO model is that it enables semantic interoperability among distributed schemas (without ontology) by semantic inheritance based on the object-oriented model. The PCO model has unique modeling facilities such as controlled overriding and partial inheritance, in order to ensure the synonymy of sharing semantics in class hierarchy. Moreover, the PCO model is designed to support rapid schema evolution, keeping with the schema interoperability across Internet services.
We have also encoded the PCO model into two XML-base languages: PCO Specification Language (PSL), a schema language in the PCO model, and Portable Composite Language (PCL), an exchange format of semantic representations in the PCO model. This paper addresses that the polymorphic views of PSL classes enable agent-mediated services to automatically interrupt the semantic equivalence of PCL-based multi-vendor catalogs that different schemas specify.
The remainder of the paper is organized as follows. Section 2 describes the scope of this paper from viewpoints of semantic interoperability in Internet commerce. Section 3 defines the design of the PCO data model. Section 4 describes implementations including PSL, PCL, and its Java-based data processor. Section 5 demonstrates Internet service integrations using PCL and PSL. Section 6 compares our efforts with related works. Section 7 concludes the paper.
Many companies have stored their business data in traditional RDMS systems for internal use. However, electronic commerce today is trying to integrate those heterogeneous data semantically to share with trading partners or consumers on the Internet. This section defines the scope of this paper from viewpoints of the semantic interoperability across Internet businesses.
In Internet commerce, a centric and monolithic schema design is unacceptable because of the exhaustive diversity of goods and services. Distributed schemas and its integrations are therefore essential.
Semantic schema integration is an old database problem that has been intensively investigated in the context of heterogeneous multi-database systems. Nevertheless, no researches could reach any kinds of definitive breakthroughs. [5] This paper doesn't aim at solving such essentially semantic mismatches as homonyms, synonyms and different dimensions among various XML documents.
We focus on the difference in the lifecycles of data structure; the trading XML data is temporary, whereas the data stored in database systems is relatively permanent. Temporality of XML data is one of the key factors in proposing a new schema design model on XML, which enables distributed schema to share semantics across the Internet, rather than to adhere the integration of existing schemas. As a result, merchant can generate various formats of XML catalogs dynamically from heterogeneous data resource.
Let us consider a typical example to concrete semantic interoperability issues in Internet commerce. Suppose that two web merchants (a PC shop and a camera store) independently define DTDs/schemas for their XML catalogs, both of which use <price> tag for representing the price of products. In this case, should a third-party's shopping agent interpret two <price> as the same meaning?
The advantage of Internet lies in establishing new business collaborations without offline partnerships. Indeed, many kinds of agent systems have been integrated Internet services across websites. However, it is difficult for shopping agents to understand underlying meanings of real world autonomously, even if they can retrieves schema definitions from web sites. For example, some shops may define <price> including value-added tax, and others may exclude any tax.
The ultimate goal of this study is to extend a schema design model on XML, which enables third parties to automatically distinguish whether each tag in distributed schemas is the same meaning or not. In other words, using the proposed model, the PC shop and the Camera store can independently define <price> tags with the same meaning explicitly, for others.
Our basic approach to the interoperability is to utilize class hierarchy in the Object-Oriented (OO) model. For example, if both of PC and DigitalCamera class are designed as the inherited classes of the common Commodity superclass, the meaning of each <price> is shared among PC, DigitalCamera and Commodity (see Figure1). Accordingly, a shopping agent can equally interpret the price of PC and Camera catalogs on the ground of class hierarchy.
|
The schema interoperability based on class hierarchy is a simple mechanism, because it can only check homonymic mismatches. However, we consider that such simplicity is suitable enough to setup cross-site applications of Internet business rapidly.
The OO-based semantic interoperability strongly depends on the design of an initial class hierarchy, although the OO model has essential difficulties in designing class hierarchy well embodied in underlying relations. Unfortunately, the relationships of goods and services vary dynamically. For example, suppose the appearance of new I/O interface connecting with PCs and digital cameras. In this case, both PC shop and Camera store will want to add new properties into their schemas as soon as possible, for the sake of representing the compatibility with each other.
The OO-based interoperability confronts the schema evolution problems, traditionally involving with hierarchy redesigns, change notifications and version controls. However, in widely distributed environments such as the Internet, the modification of super-class leads to the serious confusions for predefined sub-classes.
We must propose a design model that enables schema evolution both with keeping schema interoperability and without changing predefined classes.
XML (eXtensible Markup Language) is a portable format for semi-structured data on the Web. [18,19] The extensibility of XML allows each application to extend various data models (such as documents, tables and languages) on its syntax. We have designed PCO (Portable Compound Object) data model as a semantic modeling language for Internet service integrations. In this section, comparing with the generic XML standard model, we describe the three major extensions in the PCO model: discreteness of semantics and datatypes, object-oriented semantic definition and service-centric class sharing.
Generally, semantic representation is based on an attributed-value pair. For example, let us consider a numerical string '9.85'; it is meaningless for us, because we cannot understand what the value means exactly. However, if an author adds a semantic qualifier such as <price> to the number, we can regard it as price (at least without misunderstanding other dimension such as length and weight). In XML, accordingly, the semantic representation of price value is marked up as follows:
<price>9.85 </price>
In addition, XML/DTD define the structure of internal elements in <price>. Accordingly, we can understand the type of currency by referring to the DTD definition. However, if the type of <price> is statically defined as U.S. dollar, can an author describe the price with other currency such as Japanese yen at the same time?
We consider that the semantics should be independent of datatypes. We therefore designed that the PCO model can dynamically identify the type of value in each semantic representation. This means that the author can select datatypes freely at authoring time (see also type overloading). For example, the typed semantics of the price are described as follows.
<price type="number/currency-usd">9.85 </price> <price type="number/currency-yen">1000 </price>
Note that the PCO (Portable Compound Object) model, as the name implies, is designed to represent composite semantic structures by "class" type.
In the PCO model, we call the fundamental structure of semantic representation as chunk, comprised of the three fields: name, type and value.
A PCO schema is a collection of predefined chunks (name and types) that specifies the semantics of a PCO instance. Shortly, only <price> declared in a schema is able to have a specific meaning that according to schema designer's intensions.
In the PCO model, we can define the schema as a union of object-oriented classes, or a subclass derived from classes. The class is the smallest unit of semantic specification that allows the designer to define a set of name-type pairs in the object-oriented manners.
The PCO model supports multiple inheritance, which permits a subclass to inherit properties from multiple parent classes. However, such a multiple inheritance causes name conflict that leads conflicted names to ambiguous meanings.
The originality of PCO is a semantical management of name conflicts; the PCO model fails class inheritance whenever a name conflict occurs except that all conflicted names are derived from the same parent class. This management ensures the semantic equivalences of the same names among superclass and subclasses. Also, the PCO model defines the behaviors of semantic inheritance as follows.
Another uniqueness of the PCO model is to support partial inheritance for the reusability of names. The partial inheritance is a variation of class inheritance, which allows the designer to inherit some properties and to suppress others [14]. However, in the PCO model, a partially subclass is not defined as a child version of parent classes, in order to avoid a mess of a class hierarchy.
To exchange the semantics of PCO instances through networks, class sharing between a sender and receivers is required, because only chunks under class constraints can specify their meanings exactly; we can regard class as semantic context.
The generic XML standard, however, defines no rules to share DTDs/schemas across the Internet, except for the universal identification of DTDs by URI. Therefore, the question is ambiguous that who can design DTDs/schemas for whom? The service mediators, on the other hand, must know semantic definitions in advance, in order to program processing logics for suppliers' contents. The PCO model therefore defines the rule of class sharing between the following two parties in the context of electronic commerce.
The PCO model adopts a service-centric approach [1] to class sharing, where service mediators can define classes solely, and content suppliers create contents conformable to the pre-distributed classes (see Figure2). This approach allows mediators to interrupt properties only that they define or understand in advance and to ignore others. The elimination of class distribution from suppliers makes it possible for mediators to interrupt distributed contents automatically.
|
The basic service-centric model has some limitations to content flexibility against individual suppliers. The originality of the PCO model is to extend class-content relationship from single-pair to multiple dynamic n-tuple; suppliers accordingly can coordinate content schemas by selecting and combining classes that multiple mediators define for their own services. This means that each PCO instance contains a schema definition as a union of predefined classes.
We have encoded the PCO data model as two XML-based languages: PSL (PCO Specification Language) and PCL (Portable Composite Language). This section describes the design of PSL and PCL, including Java-based implementation of its data processor.
The PCO language is comprised of two kinds of language. The PSL defines classes or datatypes to specify PCO instances, while the PCL represents the structure of PCO instances as an XML[2]-based format. However, according to traditional semi-structured data models [16], PSL classes do not strongly constrain the full structure of PCL instances; instead, each of PSL classes are worked as a view (i.e., external schema) of a polymorphic PCO schema. Through these views, PCO data processor can access PCO instances semantically. Figure 3 shows the three-layer architecture of the PCO language.
|
The PSL is a specification language that allows the designer to define classes and datatypes. This subsection mainly describes class definition in PSL.
In PSL, the designer can define a class by using <ClassDef> module, which includes semantic definitions such as super classes, a set of predefined chunks, and other documentation information (e.g., designer's names, class descriptions, and dates). The following source is an example of PSL-based class definitions.
<ClassDef name="PersonalProfile"> <ChunkDef name="Name"/> <ChunkDef name="Affiliation"/> <ChunkDef name="PostalCode"/> <ChunkDef name="Address"/> <ChunkDef name="Phone" type="string/phone"/> </ClassDef>
<ClassDef name="NameCard"> <super class="PersonalProfile"/> <ChunkDef name="Photo" type="image/jpeg"/> </ClassDef>
Name conflicts in multiple inheritance result in ambiguous meanings; the PCO model aborts class inheritance at conflict case, because conflicted names are interrupted as homonym. However, such a strict interruption decreases the flexibility of class combinations. For reasons of improving the flexibility, we introduce an <equiv> scope to explicitly declare the synonymy of the same named chunks in other classes.
In the following example, chunks (PostalCode, Address and Phone) within the <equiv> scope in Contact are defined as the same meaning with those in PersonalProfile respectively. In other words, Contact is not child of PersonalProfile, but some chunk semantics in Contact are partially inherited from PersonalProfile.
<ClassDef name="Contact"> <equiv type="PersonalProfie"> <ChunkDef name="PostalCode"/> <ChunkDef name="Address"/> <ChunkDef name="Phone"/> </equiv> <ChunkDef name="Email" type="string/email"/> </ClassDef>
Portable Composite Language (PCL) defines an XML-based serialization format of a PCO instance.
Since XML is based on SGML/DTD document model, XML tags are used to represent not only semantics but also data structures or processing logics (e.g., <table> and <font> in XHTML). We have designed a PCL instance as the combination of the three different-purpose tag sets (i.e., namespaces). The following source is the example of a complete PCL instance.
<?xml version="1.0" ?> <d:PCO xmlns:d="urn:x-pco-structure" xmlns:m="urn:x-pco-metadata" xmlns:s="urn:x-pco-semantics"> <m:Schema> <d:class name="PersonalProfile" href="http://www/classes/PersonalProfile.psl"/> <d:class name="NameCard" href="http://www/classes/NameCard.psl"/> <d:class name="Contact" href="http://www/classes/Contact.psl" /> </m:Schema> <m:Language> en </m:Language> <s:Name lang="en"> Kimio Kuramitsu </s:Name> <s:Affiliation lang="en"> University of Tokyo </s:Affiliation> <s:Photo type="image/jpeg"> <d:file encoding="base64 "> iVBORw0KGgoAAAANSUhEUgAAAlgAAAGQCAIAAAD9V4nPAAAA AElEQVR4nO2da7aDKLNAHW3G0dPIbHoAzsn7w9v5PFVFUTx8 ln///XcDAACYj3/++ed/IrxXyAAAANeDCAEAYGoMES7LPyQS /1sikUgkEimeCkV4cJ6Zc/v7IZFIJNIDU9F1K4t2Ta583Vb2 tUpEKELA49//Fen/2Tn/+oV1vrMQ </d:file> </s:Photo> <s:PostalCode> 113-0033</s:PostalCode> <s:Address> 7-3-1 Hongo Bunkyo-ku Tokyo, Japan </s:Address> <s:Phone type="string/phone"> +81-3-5841-2483</s:Phone> <s:Email type="string/email"> kuramitsu@um.u-tokyo.ac.jp</s:Email> </d:PCO>
Note: this paper usually use "d:", "m:" and "s:" prefixes for representing PCO structure, Meta semantics and chunk semantics relatively.
In PCL, an author can define a schema that specifies the full semantics and structure of instance itself, although his definition are limited as a combination of predefined classes. The PCL prepares <m:Schema> module to coordinate PCO schema virtually. Of course, this coordination is applied to semantic inheritance scheme with the controlled overriding and multi-typed overloading in name conflicts.
We have implemented the prototype system of PCO data processor, by using Java2 (JDK1.2) programming language. The processor reads a PCL file, parsing XML syntax and PCL-extended structure, and then canonicalizes chunks as a tree-based structure. The uniqueness of the processor is that it allows software modules to access chunks only through the view of class constraints.
The following example is an excerption of sample source, in which Java-based applications access to chunks' value semantically.
try { PCO pco = PCLParser.load("file.pcl"); if(pco.openClass("NameCard") { String name = pco.getString("Name"); // name --< "Kimio Kuramitsu" Image photo = pco.getImage("Photo"); String email = pco.getString("Email"); // --< throw OutOfClassException pco.close(); }catch(OutOfClassException e) { }catch(IOException e) {}
We have already developed a GUI-based PCL editor embedded in the processor (Figure 4).
|
We have already described various kinds of product catalogs (e.g., books, personal computers, tickets, stationary and clothes) by using PSL and PCL. The section evaluates the design of the PCO model and its implementations (PSL and PCL) by using the sample description in the experimental project sponsored by JIPDEC[3].
In the original JIPDEC's experimental environment, there were two content suppliers such as a ticket vendor and a hotelkeeper on the Internet. The ticket vendor defines ConcertTicket class as a subclass of ProductCatalog. Similarly, the hotelkeeper extends HotelVoucher class from ProductCatalog. (Figure 5 shows an example of PCL-based catalog with mixed HTML catalog.) Moreover, we have developed a shopping agent service, in which a customer can place orders for any PCL instances of ProductCatalog across merchant sites.
|
<html> <title> Hotel Reservation </title> <body> <d:PCO xmlns:s="urn:x-pco-semantics" xmlns:m="urn:x-pco-metadata" xmlns:d="urn:x-pco-structure" id="00000000-04CC8001:13AC2BA8-888076EB-79E9AD62-F36499AB"> <h2><s:Name type="string">Teikoku Hotel </s:Name></h2> <h4>Winter Free Plan</h4> <blockquote><font color="#b2002c">Let's enjoy at Teikoku hotel about beautiful Japanese winter. </font></blockquote> <table width="100%"> <tr><td> Location </td><td><s:Area type="string"> Ginza, Tokyo </s:Area></td></tr> <tr> <td> Regular rate </td><td> <s:Price type="number/currency-yen">30000</s:Price> yen </td></tr> </table> <p></p> <s:Contact> <dl> <dt>Contact</dt> <dd><b>Address: </b> <s:Address type="string/address"> 1-1-1 Uchisaiwai-cho, Chiyoda-ku, Tokyo, Japan </s:Address></dd> <dd><b>Telephone: </b> <s:Phone type="string/phone"> 03-3504-xxxx</s:Phone></dd> </s:Contact> </dl> <comment> <m:Schema> <d:class name="ProductCatalog" href="http://shop/classes/ProductCatalog.psl"/> <d:class name="HotelVoucher" href="http://hotel/classes/HotelVoucher.psl"/> </m:Schema> </comment> <hr>LastUpdate: 1999.12.02.04.00.24 </d:PCO> </body> </html> |
In addition to the environment, let us here consider the following scenario in that we need schema evolutions with keeping the interoperability among businesses. (Figure 6 illustrates the overview of the scenario)
|
Suppose that a World-Cup promoter opens an intermediary service on the Internet, in which he wants to mediate tie-in products and services with special privileges for World-Cup participants. Shortly, he wants to add common properties for the ticket and hotel catalogs.
Ultimately, the promoter must define description constraints (i.e., a class) for unspecific kinds of product as well as ticket and hotel. Of course, he cannot redesign ProductCatalog directly, because additional properties for World-Cup are temporal and specialized.
The promoter is interpreted as a service mediator in the PCO model. Thus, he can create a new class that defines chunk semantics specifying requirements (such as properties of privileges) for his service, as follows:
<ClassDef name="WorldCup2002"> <equiv type="ProductCatalog"> <ChunkDef name="Name"/> <ChunkDef name="Description"/> </equiv> <ChunkDef name="AppliedTo"/> </ClassDef>
The hotelkeeper prepares a special dinner during the World-Cup season, and wants to give publicity to Internet customers at the promoter's site. He can create a new hotel catalog as follows:
<?xml version="1.0" ?> <d:PCO id=".. "> <m:Schema> <d:class name="ProductCatalog" href="http://shop/classes/ProductCatalog.psl"/> <d:class name="HotelVoucher" href="http://hotel/classes/HotelVoucher.psl"/> <d:class name="WorldCup2002" href="http://wc2002/class/WorldCup2002.psl"/> </m:Schema> <m:Language>en</m:Language> <s:Name lang="en"> Tokyo Hotel (for WorldCup 2002) </s:Name> <s:Description lang="en"> We prepare a special dinner for c </s:Description> <s:Photo type="image/jpeg"> <d:file href="http://c"/> </s:Photo> <!-- (other chunks in HotelCatalog) --> <s:Price type="number/currency-yen"> 15000</s:Price> <s:AppliedTo lang="en">Travelers </s:AppliedTo> </d:PCO>
The promoter can automatically interrupt all the PCL instances of WorldCup2002 class without understanding other chunk's meaning (such as Price and Photo). On the other hand, the existing service mediator can deal with PCL instances even if the instances contain WorldCup2002 constraints or not. For example, the shopping agent can still interrupt any PCL instances of ProductCatalog.
Note that we would like to refer the readers to our paper [8] about our prototyped system of multi-purpose shopping agent.
The scenario demonstrates the following results about the PCO model, PSL and PCL.
We here discuss related works from the following three viewpoints: domain-specific commerce language, ontology-based interoperability and object-oriented schema for XML.
Today, there are many emerging XML-based commerce languages, such as XML/EDI, ICE, OBI, and RosettaNet. [10,11,12,13,20] These languages are well employed in specific business domains, because central organizations define semantic specifications for their business domains. For example, XML/EDI is an XML-version of ANSI X12 EDI business data messages; RosettaNet, on the other hand, defines how to exchange PC product catalogs among manufactures, distributors, and resellers.
The PCO is unique in that it requires no centric coordinators to share schemas across Internet businesses. This decentralized sharing expands the applications of PCO language more widely.
Ontology, or a semantic dictionary, is a super description that defines relationships between independent schemas or data types. Also, not a few languages have been proposed to apply ontology into heterogeneous schema integrations. [7] For example, MMF is a typical commerce language (although its syntax is not XML-based) that supports an ontology layer to ensure the interoperability among distributed schemas. [17] More recently, Common Business Library (CBL) is proposed as a pragmatic ontology to be automatically mapped with the domain-specific languages, such as XML/EDI, OBI, and RosettaNet [12].
Theoretically, ontology can integrate any mismatches of heterogeneous semantics/structures across various domains. However it is difficult to maintain ontology effectively in widely distributed environments, because the increase of distributed schemas makes their relationships more complicated.
We consider that the incompatibility in schema sometimes needs to represent deliberate unpartnerships between competitive businesses. It is therefore concluded that the PCO model supports enough interoperability, because it allows partner companies at least to share common semantic constraints that they desire.
The object-oriented (OO) model and its class reusability have been widely popularized in programming languages and database systems. In XML, many object-oriented schema languages are proposed as alternatives to XML/DTD: RDF schema, SOX, and XML Schema. [1, 4, 6] These schemas, as well as PSL classes, makes it easier to extend a new schema by reusing predefined schemas.
The originality is that PCO schema and PSL class are designed to support not only reusability but also several encouragements for semantic inheritance, including negative overriding and partial inheritance. To our knowledge, the attempt to semantic inheritance across the Internet has not been reported before in the XML technical efforts.
The Internet commerce increases the demand of cross-site service integrations sharing XML-based catalogs. In addition, the diversity of products and services also essentially requires the semantic integrations of distributed schemas.
The main contribution of the PCO model is to propose an approach to semantic interoperability without ontology. The PCO originally supports various object-oriented features such as controlled overriding, multi-typed overloading, and partial inheritance to ensure the synonymy of inherited semantics among schemas/classes that different authors define independently. Moreover, schema coordination in the instance layer makes semantic relationships independent of an initial class hierarchy, and also enables rapid schema evolution across the Internet with keeping semantic interoperability but without changing predefined classes.
We have also encoded the PCO model into two XML-base languages: PCO Specification Language (PSL) and Portable Composite Language (PCL). This paper demonstrates that agent-mediators that define service semantics in PSL can automatically integrate multi-suppliers' PCL catalogs for their cross-site services.
This paper mainly focuses on the PCO model and its semantic integrations. The PCO language also supports other commerce modeling facilities, including content supply-chain of compounded PCL objects and authorized modifications of PCO schema and instances. We would like to continue discussions for establishing PCO-based electronic marketplace on the Internet.
The authors would like to thank Mr. Tadashi Murakami, Mr. Hajime Matsuda and Mr. Vikram Kant Upadhyay (the University of Tokyo) for useful comments and suggestions that helped in improving an earlier draft of this paper.