[Mirrored from: http://www.cwi.nl/~lloyd/Papers/Model/]
Hypermedia/Time-based Structuring Language (HyTime) defines constructs for representing general hypermedia document concepts. Building documents with HyTime can be difficult because it uses many constructs and has an intricate relationship with its parent language Standard Generalized Markup Language (SGML). Further, HyTime inherits from SGML the establishment of document models as well as the document instances that follow them. In this paper we introduce some techniques for modeling how HyTime and SGML constructs contribute to the structure of documents and document models. We also introduce a defined set of "meta-HyTime constructs", which correspond to the semantic concepts HyTime constructs represent. Diagramming notations are provided in conjunction with these techniques as a tool for aiding document developers in understanding and communicating their use of HyTime.
Hypermedia/Time-based Structuring Language (HyTime)3,7 is an international standard for defining hypermedia document structure. This structure is described in terms of presentation-independent concepts considered universal to hypermedia processing. These concepts are specified using the constructs HyTime provides. They are specified with Standard Generalized Markup Language (SGML)5,7, which defines the structure of documents in general. This separation between SGML and HyTime constructs within documents implies a layering in the authoring of HyTime documents.
SGML provides for the defining of both document models and the document instances that conform to them. HyTime inherits this capability. As such, SGML and HyTime document authors are concerned not only with the creation of documents but also with the designing of the general structures for the class of documents in which each fits. The specification of document models is part of the two standards. This provides an independent means of verifying a document's conformance to a particular model, thus facilitating the interchangeability of documents between applications using that model. The separation of document models from document instances implies another layering in the authoring of HyTime documents.
In this paper we introduce meta-HyTime constructs. Each meta-HyTime construct represents a general hypermedia concept that HyTime constructs describe. The HyTime standard document provides a strict definition of its constructs but does not strictly define the concepts those constructs represent. Since different HyTime constructs can be used to represent the same concepts, it is often convenient to model documents and in terms of these more general concepts instead of the specific HyTime constructs representing them. In this paper we introduce the specification of such concepts, which we call meta-HyTime constructs. This separation between HyTime and meta-HyTime constructs introduces another layering in the authoring of documents.
In earlier work we have used a diagram notation for describing SGML and HyTime documents and models2. Such diagrams are useful tools in the development of document models and instances. In this paper we describe some diagramming formalisms for representing HyTime documents on various levels of the layers described above, including that of meta-HyTime constructs.
Standard Generalized Markup Language (SGML)5,7 is an international standard for defining the textual encoding of document structure and content. SGML defines a set of constructs that build the markup that is placed with the content of a textual document file. This markup delimits the text content into containers called elements. In addition to text, elements can contain other elements or a combination of other elements and text. An element also has a generic identifier (GI) that states the element's type name. SGML also defines attributes that are associated with and describe elements. Each attribute has a name and a value. Two particularly useful types of attributes are the unique identifier (ID) and the unique identifier reference (IDREF). An ID attribute gives its element a unique name within the document. An IDREF attribute has as its value the ID of some other element in the document, thus representing a reference to the element. Together these constructs provide the hierarchical structure of a document, descriptive information about portions of the document, and the inclusion in the document of external resources and information.
Particular classes of documents are defined by SGML document type definitions (DTDs). Each SGML document must be associated with a particular DTD. A DTD defines a set of element types that can be used in conforming SGML documents. Regular expressions describing the valid contents of each element type are provided in the DTD. It also defines the set of attributes that can be used with each element type. Processing a document instance with its DTD can check the validity of the document format.
The driving philosophy behind SGML is that documents should be represented in a way that is independent of their means of presentation. The result is that an SGML document can be used by a variety of applications for a variety of purposes without having to be re-edited. This characteristic of SGML use also facilitates the processing of documents by SGML applications not designed for that document's set.
Hypermedia/Time-based Structuring Language (HyTime)3,7 is an international standard for defining the SGML encoding of hypermedia document structure. HyTime defines a set of primitives, called architectural forms, that represent the hypermedia aspects of a document. These aspects include multi-directional and multiply anchored hyperlinking, descriptive, flexible, and powerful document object locating, and the scheduled placement of document objects along measured axes.
HyTime extends SGML by defining how instances of these architectural forms are built from SGML constructs. An SGML element is recognized as conforming to a particular HyTime architectural form through its HyTime architectural form attribute. When assigned to an element, it establishes that element as a HyTime element. The name of this attribute is typically "HyTime". Its value is the name of the architectural form to which the element conforms. A HyTime element of a particular form also has other attribute assignments particular to that form. This collection of HyTime attributes defines the hypermedia semantics that a particular element conveys.
There are two types of HyTime architectural forms: the element type form (ETF) and the attribute list form (ALF). Each HyTime element is recognized, through the HyTime architectural form attribute as described above, as conforming to one ETF. A particular set of HyTime attributes is defined for each ETF. Each ETF also uses the HyTime attributes defined for a group of ALFs. The attributes of one ALF can be shared by multiple ETFs.
HyTime is divided into six modules, each of which represents an area of hypermedia structuring. These modules are named base, measurement, location addressing, hyperlinking, scheduling, and rendering. In this paper we focus on modeling constructs from the hyperlinking and location addressing modules. Some constructs from the base module are also used. Hyperlinking module constructs are used to describe the hypertext relationships that exist between different document portions. Location addressing module constructs establish document portions as accessible for use with other constructs such as those that define hyperlinks.
HyTime's role in document processing is similar to SGML's. Both enable presentation-independent document formatting, both enhance document portability, and applications of both typically expect and apply their own semantics to composites of the constructs of the two languages.
In this paper we introduce document diagramming notations for six aspects of HyTime documentation. These six aspects are:
The first five of these diagram notations is illustrated in the subsections ahead using the HyTime document in Example 1. The diagramming of meta-HyTime construct usage is described in Section 4, also using Example 1.
The HyTime code in Example 1 represents a basic hypertext document. This document contains the string "this cites this". Its HyTime constructs encode a hyperlink connecting the first "this" and the second "this". The hyperlink is considered as of type "citations". The first "this" substring is encoded as the "start" of the link, and the second "this" as the link's "end". Hypertext traversal is allowed in any direction between the two words.
The ETFS used in this document are HyTime document (HyDoc), suppress-HyTime (sHyTime), independent link (ilink), data location (dataloc) and dimension list (dimlist). The root element of any HyTime document conforms to the HyDoc ETF. It establishes the document as using HyTime constructs. The sHyTime ETF establishes an element as having no HyTime semantics and not requiring HyTime processing. Such elements do, however, still use the attributes of certain ALFs. The ilink ETF is used to establish a hypertext relationship between different parts of a document. A dataloc element defines a portion of an element's content as accessible to HyTime constructs. Without location ETFs such as dataloc, a part of a document could only be referenced if it was an element with an ID attribute assigned to it. Finally, a dimlist can be contained in a location element such as a dataloc to specify the measurements defining a portion of an element's data.
In the Example 1 document, dataloc elements are used to make the "this" substrings accessible as distinct document objects. An ilink element uses these datalocs to establish a hypertext relationship between the two substrings. Since the element containing the string has no special HyTime semantics, it is defined as an sHyTime element. Finally, all of these elements are contained in a HyDoc to establish the document as a HyTime document.
The following five subsections show how modeling and diagramming techniques can be used to convey and illustrate this usage of SGML and HyTime. Section 4 shows how to use these techniques with meta-HyTime constructs.
The text in a DTD and document instance is parsed to generate the representation of a document's structure. This representation of SGML constructs accounts for elements, their generic identifiers, their attribute assignments, their contents, and the referencing of other elements.
The key for SGML parsed document diagrams is in Figure 1. A box is drawn for each element instance. The generic identifier for each element is typed in its box. Below an element's box is typed its attribute assignments that result from the parse. Some of these assignments may have been fixed in or defaulted from the DTD. Others may have been explicitly defined in the document instance. Solid lines connect elements to their contents. Sometimes this content is text; other times it is other elements. Dashed arrows connect IDREF attributes to the elements they reference.
The SGML parsed document diagram for Example 1 is shown in Figure 2. It shows that the root element of the document tree is a book element. Further, the book element is depicted as containing a citation followed by two location elements and a text element. The location elements in turn each contain dims elements. Attribute assignments are included under the these elements. Also demonstrated is that three of these elements contain text strings. Finally, the id reference from the citation element to a text element is shown.
An SGML DTD describes the possibilities and restrictions for documents that conform to it. It identifies the types of elements that can exist. It also specifies, using a regular expression-based syntax, what elements of each type are allowed to contain. Further, a DTD provides attribute declarations for each element type that describe what attributes elements can have, what their values can be, and what their default values are.
The key for SGML DTD diagrams is in Figure 3. This notation is similar to that for SGML parsed documents. Element generic identifiers are put in boxes, attributes are described underneath those boxes, and solid lines connect boxes to content. With DTD diagrams, however, each box corresponds to an element type rather than an element instance. The text directly underneath these boxes depict DTD attribute declarations rather than assignments. Finally, content is described using content models rather than as explicit sequences.
The graphical notation for content models provides all the information given in the content models of SGML code. Element types can be grouped together as sequences or or-groups. Multiple occurrences of elements and groups can be shown. A depiction for SGML inclusion sets is also provided.
The SGML DTD diagram for Example 1 is shown in Figure 4. It shows how book elements can contain any number of citation and text elements in any order. It also displays that the latter element types contain text data.
The syntax of HyTime documents is defined in the standard by the HyTime meta-DTD. It shows what patterns of SGML constructs in documents comprise HyTime constructs. This relationship of SGML constructs to patterns of HyTime constructs is complex. We have found in helpful in our work to represent the meta-DTD diagrammatically. Such diagrams are useful tools for illustrating how HyTime constructs arise from SGML and how HyTime constructs relate to one another.
The structure of the meta-DTD is characterized by the existence of two types of objects: element type forms (ETFs) and attribute list forms (ALFs). Both of these types of forms have SGML-defined declarations for HyTime attributes. There are a number of relationships that exist between forms of these types. Each ETF can contain a certain pattern of elements conforming to other ETFs. The textual representation of this pattern is based on SGML content model notation. Each ETF uses the attributes of particular ALFs. Another type of relationship is that a HyTime ID attribute can be restricted to referencing only elements of certain ETFs.
The key for HyTime meta-DTD diagrams is in Figure 5. As with the previous two diagram notations, each box corresponds to the concept of an element. This notation introduces a new icon for representing ALFs: the smoothed box. Both ETF and ALF icons have attribute declarations under them in the same fashion used for SGML DTD diagrams. The content model notation used for relating ETFs to each other is also taken from the DTD diagram notation. Dashed arrows associate IDREF attributes to the ETFs they are restricted to referencing. IDREF attributes with no arrows have no such restrictions.
The complete meta-DTD diagram is rather large and complex. The portion of the HyTime meta-DTD diagram that is relevant for Example 1 is shown in Figure 6. The root element of any HyTime document must conform to the HyTime Document (HyDoc) element type form, as it does in the example document. A HyDoc element can contain any combination of many ETFs, including contextual link (clink), data location (dataloc), dimension list (dimlist), and suppress-HyTime (sHyTime) elements. This diagram also shows a dataloc element as being able to contain a dimlist element. Also, the ALFs all-id and all-ref can be used by any HyTime element.
Here we show the ilink attributes that are used in our sample document. The link ends (linkends) attribute contains id references to the anchors of this link. The anchor roles (anchrole) attribute assigns a role name to each anchor. A dataloc element is required to have an ID attribute, as shown in the diagram. It also must have a location source (locsrc) attribute, which specifies an element within the document to which the dataloc's address is to be applied. This address is the dataloc's text contents, which specify numbers indicating the tokens of a location source's text contents are to be located. The quantum attribute specifies how these tokens are to be established and counted. There are no restrictions on what sHyTime elements contain.
All the ETFs use the ALFs all-id and all-ref. The all-id ALF establishes some attributes that any HyTime element can use. Here we show the definition of the ID attribute, which is the same as the SGML ID attribute. The HyNames attribute can reassign for its element an SGML attribute with a particular name as being a HyTime attribute of a different name. The reference type (reftype) attribute of the all-ref ALF places restrictions on the types of elements that an IDREF attribute can reference. The use of these attributes will be illustrated in the subsections ahead.
The HyTime standard specifies a collection of constructs that can be recognized from processing SGML documents. SGML elements can be recognized as instances of HyTime element type forms. Attributes of these elements are sometimes recognized as HyTime attributes. Because ID attributes are considered HyTime attributes and some HyTime attributes are IDREFs, many unique identifier references will be recognized as HyTime constructs.
The key for HyTime document construct diagrams is in Figure 7. This diagram notation is very similar to that for SGML parsed documents. One difference is that the text within each element box is not the element's generic identifier but the name of the ETF it conforms to. The GI of an element is put between angled lines above the element's attribute specifications. Another difference is that the HyTime attributes of each element are given in italics, while the SGML-only attributes remain in normal text.
The HyTime document construct diagram for Example 1 is shown in Figure 12. It demonstrates that the seven elements are recognized as instances of the HyDoc, ilink, dataloc, dimlist, and sHyTime ETFs. The first HyTime attribute under the ilink element is named "linkends" because that is its HyTime name. The SGML name for that element, "anchors", is shown in an SGML-only attribute assignment at the bottom of the list. This reassignment was specified by the value of the HyNames attribute for the ilink. Since the other attributes have the same SGML and HyTime names, they only need to be listed once.
HyTime only defines the recognition of constructs within SGML parsed documents. The use of SGML DTD constructs to define the meta-DTD can be confusing because HyTime neither specifies a single DTD nor the conformance of DTDs. Further, DTD constructs alone cannot specify all the constraints that would make all of its documents HyTime conforming. However, the designing of DTDs for HyTime documents is still important. A DTD can often specify some constraints that enforce HyTime-conformance. Wary DTD developers can avoid unnecessarily permissive DTDs. DTD writers must also avoid making their DTDs HyTime-preventing. DTDs for classes of HyTime-conforming documents are often called HyTime DTDs (HDTDs).
The designing of HyTime DTDs is also important because some HyTime constructs can be used to contribute to the specification of a document model. Typically, such constructs define restrictions on allowable instances of certain other constructs. The ID reference element type (reftype) HyTime attribute is an example of such a construct, and is used in Example 1. When assigned for an element, reftype specifies that a particular IDREF attribute can only reference elements of certain types. If this attribute is declared in the DTD as fixed, then the same restriction applies to all instances of that element type. Such a global restriction could be considered part of the model for that document. HyTime has other constructs that can contribute to document model specifications, such as lexical typing and property defining. When constructs like these are fixed in a DTD, they can be considered as contributing to the document model defined by the HDTD.
We have developed a diagram notation for representing the document models defined by HDTDs. The key for HyTime-defined document model diagrams is in Figure 9. It borrows icons from the diagram notations for both SGML DTDs and the HyTime meta-DTD. It contains the data within a DTD diagram with some modifications and the addition of HyTime-define document model information. One modification is that the box icons contain ETF names instead of generic identifiers. However, each box still corresponds to an element type, and the generic identifier is placed under the box above the attribute declarations. One addition is the use of dotted arrows to associate IDREF attribute declarations to the element types they are restricted by reftypes to referencing.
The HyTime-defined document model diagram for Example 1 is shown in Figure 10. It shows the information within the DTD diagram from Figure 4. The ETF names are added, as is the reftype-defined restriction that the locsrc attribute can only reference text elements.
The modeling and diagramming of five aspects of HyTime documentation have been described in Section 3. In this section we introduce a sixth aspect, that of meta-HyTime constructs. HyTime defines constructs for representing some general hypermedia concepts. However, it does not formalize the concepts themselves. Often multiple HyTime constructs represent the same generic concepts. Such groups of constructs are differentiated by the syntactic context in which they are applied rather than by the general concept they represent. It is helpful to identify these concepts when considering a document in terms of its general hypermedia structure but independently of its HyTime syntax.
In this section we introduce meta-HyTime constructs, which correspond to these syntax-independent concepts. For each meta-HyTime construct identified, a diagram notation for it and a specification of the HyTime constructs patterns that comprise it are given. These constructs enable document developers to model and structure their documents in terms of these general hypermedia concepts instead of just in terms of specific HyTime constructs. Here we continue to use the code from Example 1 as the basis for sample diagrams. Sample meta-HyTime constructs for hyperlinking and location addressing are introduced as a demonstration of how meta-HyTime constructs in general can be created and applied. The code Example 1 in the previous section is used here to illustrate their use.
The most important HyTime-defined concept in Example 1 is the hyperlink. It is the concept that associates the first "this" with the second. The ilink ETF was used to represent this hyperlink, but other HyTime constructs could have been used to represent equivalent and similar hyperlinks. One meta-HyTime construct is the hyperlink. It is derived from the usage of the clink form. It can also be derived from the usage of Hytime constructs independent link (ilink), aggregate link (agglink), and span link (spanlink).The components of the hyperlink meta-construct are derived from the ilink ETF. The ilink ETF defines all the hyperlinking semantics definable by any of the other HyTime link construct.
The ilink ETF and its attributes define certain hyperlinking semantics. These include the link type, the anchors, the significance of each anchor in hyperlink, and the link's traversability. The link type is a single word that describes the class of the link. In SGML/HyTime syntax this word is the generic id of the hyperlink element. The anchors themselves are determined by the linkends attribute as a list of ID references to the anchors. The anchor roles (anchrole) attribute assigns to each anchor a name describing its role in the hyperlink. The significance of each anchor can be further specified by the link end terms (endterms) attribute, which references for each anchor a document subtree describing that anchor. Finally, the allowable directions of traversal through the anchors are determined by the external access traversal rule (extra) and internal access traversal rule (intra) attributes.
The key for meta-HyTime hyperlink construct diagrams is in Figure 11. This key provides for the specification of the hyperlink itself, the type of the link, its anchors, their roles, and the directions in which they can be traversed to and from along the hyperlink. The meta-HyTime hyperlink construct diagram for Example 1 is shown in Figure 12. The ilink element with the GI of "citation" is displayed as a hyperlink of link type "citation". Its anchors are defined as the locations specified by the dataloc elements. The modeling of this locations' resolution is described in the next subsection. The anchor names for the link are "start" and "end". Bidirection traversal to and from both anchors is allowed.
There any many element type forms in the location addressing module of HyTime. The primary differences between them are in how they use SGML and HyTime syntax and semantics to specify the location of objects. They are similar in that they all specify the location of some document object. We have found two meta-HyTime constructs useful for representing the general hypermedia significance of location elements within a document structure. The two constructs are the resolved location and the extra-SGML object.
A resolved location meta-HyTime construct associates a location element with the document object it locates. Here, the box containing the ETF name for a location element is colored gray to make it visually distinct. A thick gray line with a gray arrowhead at its end connects this box with the graphical representation of the object it locates. This makes it easy for the viewer of a document structure diagram to follow a HyTime-defined reference through a sequence of location specifications to its destination.
When a location resolves to an object defined using SGML, its graphical representation is a thick gray arrow pointing to the icon representing the located SGML construct. However, HyTime location addressing can locate objects not defined using SGML constructs. These include the data location address (dataloc) ETF, which resolves to a portion of data content rather than to an SGML construct. Such located objects are represented by the extra-SGML object meta-HyTime construct. An extra-SGML object is depicted as a gray box surrounding the diagram portion representing the located object.
Figure 12 has a diagram with hyperlink and location address meta-HyTime constructs. The boxes for the two dataloc elements are colored gray to indicate they are location addresses. A large gray arrow connects these boxes the words they locate.
HyTime database research has been performed as GMD-IPSI in Darmstadt, Germany. They have described how the layering of HyTime over SGML affects the model for HyTime processing1. A general model of hypertext has been proposed by the Dexter group6. The Dexter model defines constructs that can be used to describe any hypertext document or system, enabling comparison between different systems. Another model, Hypertext Design Model (HDM), has been proposed for representing hypertext applications has been proposed4.
This paper described some techniques for modeling the use of SGML and HyTime constructs in defining document structure. Diagram notations were provided for depicted the models created. These modeling techniques and their diagram notations reflect the various layers of construct definition from which SGML and HyTime encoding can be viewed. modeling and diagraming schemes were presented for the SGML encoding of document instances, the SGML encoding of document models, the HyTime meta-DTD, the HyTime encoding of document instances, and the HyTime encoding of document models.
This paper also introduced meta-HyTime constructs for representing the general hypermedia concepts HyTime constructs encode. These meta-HyTime constructs enable another layer of modeling and diagramming document structure. This meta-HyTime layer models documents in terms of their general hypermedia structure rather than in terms of the specific HyTime constructs that encode this structure. Modeling and diagramming with this layer assists document developers in understanding and communicating how HyTime constructs contribute to a document's general hypermedia composition.