[This local archive copy mirrored from the canonical site: http://www.techno.com/need2know/index.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
What You Need to Know About the New HyTime
If none of the following statements applies to you, it's just barely
possible that you don't need to know anything about the new HyTime,
and you should stop reading this article immediately. (Don't bet on
it, though.) This list is by no means complete; it's meant to
pique your interest.
- You need to modify a DTD for purely local reasons, but you can't
because to do so would compromise your ability to interchange
information.
- You'd like to combine more than one DTD to get the features of all
of them in a single document type. You need to do this without name
collisions, without changing the DTDs to be combined, all while
smoothly taking into account any overlapping or duplicated
semantics.
- You think it would be very nice to express in a standard way that,
in order for the document to be regarded as valid and well-formed,
certain strings (in content or in attribute values) must conform to
certain lexical models.
- You are concerned about the growing number of formally
undifferentiated classes of system identifiers, such as filesystem
addresses, URLs, addresses of compressed, encrypted, and/or sealed
files, and addresses of files within multifile container files (such
as zip files, tar files, and MIME messages).
- You want to understand fully the "property set" and "grove" paradigm
on which Standard Document Query Language (SDQL), the fundamental
query language shared by both HyTime and DSSSL, is based.
- In any case where element A contains element B, and element B has
content, you need to verify that the content of element B would be
valid in its context as the content of element A (i.e., even after
deleting the effective start tag and end tags of element B).
- You want your element type names to be self-describing, in an
internationally standardized way, without having to make them
longer.
- You need to provide values for the attributes of notations used in
the content of specific elements.
- You'd like to declare the notation of data used as the values of
attributes.
- You'd like to require that the IDs given in the value of particular
IDREF attributes be the IDs of particular element types.
- You'd like to be able to change the default values of attributes
in the middle of a document instance, instead of always being
whatever default value was given in the DTD.
If any of the above statements do apply to you, then you should read
the appropriate section of Annex A ("SGML Extended Facilities") of the
HyTime Second Edition:
- A.2 for lexical modeling of element content and attribute values.
- A.3 for information about inheriting the semantic and syntactic
characteristics of other DTDs in your own DTD. Personally, I think
A.3 is the single most revolutionary and far-reaching aspect of the
new HyTime standard, and I would urge most SGML veterans to start
there. The HyTime standard is now primarily two things: the HyTime
architecture itself (which is essentially a very abstract DTD for
hyperdocument structuring described in clauses 1-11), and the SGML
Extended Facilities (Annex A). In the original edition of HyTime,
there was only the HyTime DTD, whose element types were called
"architectural forms". Each of the element types were designed to
be inherited by actual element types in actual DTDs (that's why the
HyTime DTD was called a "meta-DTD"). Now, though, the HyTime DTD is
regarded as one aspect of the definition of the "HyTime
Architecture", and A.3 describes how to inherit not only the HyTime
architecture, but any combination of architectures (i.e., any
combination of DTDs) you like. A.3 is called the "Architectural
Form Definition Requirements" or, simply, "AFDR".
- A.4 for information about the "property set" and "grove" paradigms.
Many would say that this is the single most revolutionary and
far-reaching aspect of the 2nd edition of HyTime, and it would be
hard to argue with them. A "property set" is the formal description
of all the kinds of things that a parser ("notation processor")
might find in an information resource expressed in some particular
notation, and the relationships between those things. An SGML
parser is expected to recognize certain kinds of things in an SGML
document; those things are described in the "SGML Property Set" that
can be found in Annex A.7. (It is not light reading.) Another
property set, the property set for the HyTime architecture, is found
in Annex B. (It is not light reading, either. Fortunately, only
system implementers and advanced SDQL query authors need to read
property sets.)
- A.5 for information about the "General Architecture." This brief
inheritable DTD provides special "common" attributes and other
features that are useful in many SGML contexts. The common
attributes provide for such things as element self-description, and
expressing the requirement that element content be checked for
validity in the surrounding context.
- A.6 for information about "Formal System Identifiers". These allow
system identifiers to contain metadata that allows them to be
recognized as pertaining to specific filesystems, networks,
containerizations (such as MIME), etc.
All of the above "SGML Extended Facilities" are due to be incorporated,
one way or another, in the next edition of SGML itself, so one might as
well learn about them now, if one wishes to keep abreast of forthcoming
developments in the SGML arena.
Because of the overwhelming importance of the SGML Extended
Facilities, one might ask why the HyTime standard is still called,
simply, "HyTime". I don't know; it's just inertia, I guess. I
suppose it might be less confusing to call it "ISO/IEC 10744:1997",
but that's not a very inspiring name, and it doesn't roll off the
tongue. (Come to think of it, "SGML Extended Facilities" is not
exactly a football chant, either.) Anyway, strictly speaking, the
name "HyTime" only applies to the HyTime architecture described in
Clauses 1-11, although we often use the word "HyTime" to mean the
whole ISO 10744 standard, including the all-important Annex A.
The good old HyTime meta-DTD itself, now known as "the HyTime
architecture", is much the same in its broad outlines. It's still
primarily about addressing, and about doing new things with old
information by addressing it from hyperlinks and schedules. HyTime
still makes the outrageous (but still entirely true) claim of
providing a way to address anything, anywhere, anytime. Because of
the new grove paradigm, and because of the new HyTime property set,
however, it is now extremely clear exactly how these marvelous
addressing facilities should be implemented. Everything, expressed in
every notation, or contained in any database conforming to any schema,
becomes addressable by first turning it into a grove conforming to a
model expressed by a property set written to reflect the semantic
structure of that notation or schema. Therefore, everything becomes
addressable nodes in groves, and everything is addressable in terms of
the semantics of its notation of origin, by means of arbitrarily
complex queries, using the names of properties defined in the property
set. The bottom line is that the early period of hand-waving about
how HyTime really works is over, everyone who wishes to do so is
invited to implement HyTime, and queries expressed in SDQL are
portable from conforming implementation to conforming implementation.
Many changes in the HyTime architecture were made in order to make it
even more powerful and general. At the same time, however, great care
was taken to avoid obsoleting any existing HyTime documents or
HyTime-derived DTDs. The hyperlinking facilities are expanded; good
old ilink is still there, but the preferred core notation for
hyperlinks is now the "hylink" architectural form (element type).
Hylink uses one attribute per linkend, instead of using a single
attribute for a list of IDREFS. Similarly, the scheduling and
rendition modules have been expanded and generalized.
In general, the Second Edition of HyTime is much easier to understand
than the first one was. The normative text was greatly expanded and
extensively checked by long-suffering volunteers for
comprehensibility. In addition, there are 538 explanatory notes, many
of which are quite lengthy. Even so, like most big technical
documents, the HyTime standard must be read more than once in order to
be thoroughly understood. It's safe to say that there are inevitably
some parts that can't be understood properly unless many other parts
are already understood. We did the best we could, and we ourselves
are pleased with the result. We all hope you will find it extremely
useful.
The four editors of the second editon of HyTime included the original
two, Charles F. Goldfarb and Steven R. Newcomb, and two from the next
generation of SGML designers, gurus, and implementers, W. Eliot Kimber
and Peter J. Newcomb. Many other people contributed valuable
insights, design suggestions, and very considerable editorial
assistance. At the risk of inadvertently offending some by failing to
mention their highly valued contributions, I'd like to recognize
especially the contributions of Martin Bryan, James Clark, Sam
Hunting, and Victoria T. Newcomb to the years of effort that the
Second Edition represents. The fact that the sheer length of the
document increased from just over 100 pages to nearly 500 would lead
one to believe that the Second Edition was a major effort by many
people, but in fact the size differential, large as it is, hardly
begins to account for the overwhelming amount of work that went into
it.
A new (but memberless) SIG of the International SGML Users' Group, the
HyTime Users' Group, already offers a variety of materials about both
the first and second editions of HyTime, together with pointers to
other useful information, at its new web site, http://www.hytime.org.
This site, still far from complete, is expected to grow rapidly.
The GCA's Fourth International HyTime Conference was held in Montreal
in August, where we learned that industrial adoption of the Second
Edition's new concepts and formalisms is already underway in several
major initiatives:
- The HL7-based "Kona" architecture for patient health records is
gaining enthusiastic support from the healthcare industry because it
allows interchange by means of inherited architectures, without
constraining locally-controlled inheriting DTDs.
- There is an effort to develop layers of inherited/inheriting
architectures for enterprise integration at Nortel.
- HyTime independent linking is used for modeling drug interaction
data, and no special HyTime technology was needed or used in the
project.
- The need to make theses and dissertations available for grove-based
querying by remote servers has led to the development of a HyTime
Engine Peer-to-peer Protocol (HEP) at Virginia Tech.
- A comprehensive grove/database toolset, GroveMinder, has been
demonstrated and is under development at TechnoTeacher, Inc.
- A HyTime browser with DSSSL-controlled rendering will likely come
from Fujitsu in the near future.
- HyTime's scheduling model is being incorporated into a high-function
hypermedia authoring/delivery system at CWI (Amsterdam).
- There is considerable interest in using HyTime for STEP interchange.
- Legal research and activism is underway in Seattle and in Washington
D.C., to apply HyTime's activity policy association model to
facilitate the protection and exploitation of intellectual property.
Many of the the presentation materials used at the conference can
be found at http://www.hytime.org/ihc97/hytime97.htm.
With the publication of the HyTime Second Edition on August 1,
1997, the ISO put HyTime once again at the leading edge of information
management. It's powerful, and it will likely come on slow and
strong, just as the leading edge of SGML always has.
(This article appeared in the October, 1997 (Volume 3, Issue 4)
of the International SGML Users' Group Newsletter.)