[This local archive copy mirrored from the canonical site: http://www.techno.com/need2know/index.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

What You Need to Know About the New HyTime

Steven R. Newcomb

If none of the following statements applies to you, it's just barely possible that you don't need to know anything about the new HyTime, and you should stop reading this article immediately. (Don't bet on it, though.) This list is by no means complete; it's meant to pique your interest.

You need to modify a DTD for purely local reasons, but you can't because to do so would compromise your ability to interchange information.
You'd like to combine more than one DTD to get the features of all of them in a single document type. You need to do this without name collisions, without changing the DTDs to be combined, all while smoothly taking into account any overlapping or duplicated semantics.
You think it would be very nice to express in a standard way that, in order for the document to be regarded as valid and well-formed, certain strings (in content or in attribute values) must conform to certain lexical models.
You are concerned about the growing number of formally undifferentiated classes of system identifiers, such as filesystem addresses, URLs, addresses of compressed, encrypted, and/or sealed files, and addresses of files within multifile container files (such as zip files, tar files, and MIME messages).
You want to understand fully the "property set" and "grove" paradigm on which Standard Document Query Language (SDQL), the fundamental query language shared by both HyTime and DSSSL, is based.
In any case where element A contains element B, and element B has content, you need to verify that the content of element B would be valid in its context as the content of element A (i.e., even after deleting the effective start tag and end tags of element B).
You want your element type names to be self-describing, in an internationally standardized way, without having to make them longer.
You need to provide values for the attributes of notations used in the content of specific elements.
You'd like to declare the notation of data used as the values of attributes.
You'd like to require that the IDs given in the value of particular IDREF attributes be the IDs of particular element types.
You'd like to be able to change the default values of attributes in the middle of a document instance, instead of always being whatever default value was given in the DTD.

If any of the above statements do apply to you, then you should read the appropriate section of Annex A ("SGML Extended Facilities") of the HyTime Second Edition:

A.2 for lexical modeling of element content and attribute values.
A.3 for information about inheriting the semantic and syntactic characteristics of other DTDs in your own DTD. Personally, I think A.3 is the single most revolutionary and far-reaching aspect of the new HyTime standard, and I would urge most SGML veterans to start there. The HyTime standard is now primarily two things: the HyTime architecture itself (which is essentially a very abstract DTD for hyperdocument structuring described in clauses 1-11), and the SGML Extended Facilities (Annex A). In the original edition of HyTime, there was only the HyTime DTD, whose element types were called "architectural forms". Each of the element types were designed to be inherited by actual element types in actual DTDs (that's why the HyTime DTD was called a "meta-DTD"). Now, though, the HyTime DTD is regarded as one aspect of the definition of the "HyTime Architecture", and A.3 describes how to inherit not only the HyTime architecture, but any combination of architectures (i.e., any combination of DTDs) you like. A.3 is called the "Architectural Form Definition Requirements" or, simply, "AFDR".
A.4 for information about the "property set" and "grove" paradigms. Many would say that this is the single most revolutionary and far-reaching aspect of the 2nd edition of HyTime, and it would be hard to argue with them. A "property set" is the formal description of all the kinds of things that a parser ("notation processor") might find in an information resource expressed in some particular notation, and the relationships between those things. An SGML parser is expected to recognize certain kinds of things in an SGML document; those things are described in the "SGML Property Set" that can be found in Annex A.7. (It is not light reading.) Another property set, the property set for the HyTime architecture, is found in Annex B. (It is not light reading, either. Fortunately, only system implementers and advanced SDQL query authors need to read property sets.)
A.5 for information about the "General Architecture." This brief inheritable DTD provides special "common" attributes and other features that are useful in many SGML contexts. The common attributes provide for such things as element self-description, and expressing the requirement that element content be checked for validity in the surrounding context.
A.6 for information about "Formal System Identifiers". These allow system identifiers to contain metadata that allows them to be recognized as pertaining to specific filesystems, networks, containerizations (such as MIME), etc.

All of the above "SGML Extended Facilities" are due to be incorporated, one way or another, in the next edition of SGML itself, so one might as well learn about them now, if one wishes to keep abreast of forthcoming developments in the SGML arena.

Because of the overwhelming importance of the SGML Extended Facilities, one might ask why the HyTime standard is still called, simply, "HyTime". I don't know; it's just inertia, I guess. I suppose it might be less confusing to call it "ISO/IEC 10744:1997", but that's not a very inspiring name, and it doesn't roll off the tongue. (Come to think of it, "SGML Extended Facilities" is not exactly a football chant, either.) Anyway, strictly speaking, the name "HyTime" only applies to the HyTime architecture described in Clauses 1-11, although we often use the word "HyTime" to mean the whole ISO 10744 standard, including the all-important Annex A.

The good old HyTime meta-DTD itself, now known as "the HyTime architecture", is much the same in its broad outlines. It's still primarily about addressing, and about doing new things with old information by addressing it from hyperlinks and schedules. HyTime still makes the outrageous (but still entirely true) claim of providing a way to address anything, anywhere, anytime. Because of the new grove paradigm, and because of the new HyTime property set, however, it is now extremely clear exactly how these marvelous addressing facilities should be implemented. Everything, expressed in every notation, or contained in any database conforming to any schema, becomes addressable by first turning it into a grove conforming to a model expressed by a property set written to reflect the semantic structure of that notation or schema. Therefore, everything becomes addressable nodes in groves, and everything is addressable in terms of the semantics of its notation of origin, by means of arbitrarily complex queries, using the names of properties defined in the property set. The bottom line is that the early period of hand-waving about how HyTime really works is over, everyone who wishes to do so is invited to implement HyTime, and queries expressed in SDQL are portable from conforming implementation to conforming implementation.

Many changes in the HyTime architecture were made in order to make it even more powerful and general. At the same time, however, great care was taken to avoid obsoleting any existing HyTime documents or HyTime-derived DTDs. The hyperlinking facilities are expanded; good old ilink is still there, but the preferred core notation for hyperlinks is now the "hylink" architectural form (element type). Hylink uses one attribute per linkend, instead of using a single attribute for a list of IDREFS. Similarly, the scheduling and rendition modules have been expanded and generalized.

In general, the Second Edition of HyTime is much easier to understand than the first one was. The normative text was greatly expanded and extensively checked by long-suffering volunteers for comprehensibility. In addition, there are 538 explanatory notes, many of which are quite lengthy. Even so, like most big technical documents, the HyTime standard must be read more than once in order to be thoroughly understood. It's safe to say that there are inevitably some parts that can't be understood properly unless many other parts are already understood. We did the best we could, and we ourselves are pleased with the result. We all hope you will find it extremely useful.

The four editors of the second editon of HyTime included the original two, Charles F. Goldfarb and Steven R. Newcomb, and two from the next generation of SGML designers, gurus, and implementers, W. Eliot Kimber and Peter J. Newcomb. Many other people contributed valuable insights, design suggestions, and very considerable editorial assistance. At the risk of inadvertently offending some by failing to mention their highly valued contributions, I'd like to recognize especially the contributions of Martin Bryan, James Clark, Sam Hunting, and Victoria T. Newcomb to the years of effort that the Second Edition represents. The fact that the sheer length of the document increased from just over 100 pages to nearly 500 would lead one to believe that the Second Edition was a major effort by many people, but in fact the size differential, large as it is, hardly begins to account for the overwhelming amount of work that went into it.

A new (but memberless) SIG of the International SGML Users' Group, the HyTime Users' Group, already offers a variety of materials about both the first and second editions of HyTime, together with pointers to other useful information, at its new web site, http://www.hytime.org. This site, still far from complete, is expected to grow rapidly.

The GCA's Fourth International HyTime Conference was held in Montreal in August, where we learned that industrial adoption of the Second Edition's new concepts and formalisms is already underway in several major initiatives:

The HL7-based "Kona" architecture for patient health records is gaining enthusiastic support from the healthcare industry because it allows interchange by means of inherited architectures, without constraining locally-controlled inheriting DTDs.
There is an effort to develop layers of inherited/inheriting architectures for enterprise integration at Nortel.
HyTime independent linking is used for modeling drug interaction data, and no special HyTime technology was needed or used in the project.
The need to make theses and dissertations available for grove-based querying by remote servers has led to the development of a HyTime Engine Peer-to-peer Protocol (HEP) at Virginia Tech.
A comprehensive grove/database toolset, GroveMinder, has been demonstrated and is under development at TechnoTeacher, Inc.
A HyTime browser with DSSSL-controlled rendering will likely come from Fujitsu in the near future.
HyTime's scheduling model is being incorporated into a high-function hypermedia authoring/delivery system at CWI (Amsterdam).
There is considerable interest in using HyTime for STEP interchange.
Legal research and activism is underway in Seattle and in Washington D.C., to apply HyTime's activity policy association model to facilitate the protection and exploitation of intellectual property.

Many of the the presentation materials used at the conference can be found at http://www.hytime.org/ihc97/hytime97.htm.

With the publication of the HyTime Second Edition on August 1, 1997, the ISO put HyTime once again at the leading edge of information management. It's powerful, and it will likely come on slow and strong, just as the leading edge of SGML always has.

(This article appeared in the October, 1997 (Volume 3, Issue 4) of the International SGML Users' Group Newsletter.)