Publication

Electronic copies of this text will be found on the UIUC (TEI) Listserver, in the SGML Repository, and in the SGML Project archives. Entity references replace SGML markup characters in this HTML version.


Back to the Frontiers and Edges


                Back to the Frontiers and Edges:
       Closing Remarks at SGML '92:  the quiet revolution
            Graphic Communications Association (GCA)
                     C. M. Sperberg-McQueen
                        29th October 1992

Note:  This is a lightly revised version of the notes from which
the closing address of the SGML '92 conference was given.  Some
paragraphs omitted in the oral presentation are included here;
some extemporaneous additions may be missing.  For the sake of
non-attendees who may see this, I have added some minimal biblio-
graphic information about SGML '92 talks referred to.  I have not
added bibliographic references for HyTime, DSSSL, etc.  If you
are reading this, I assume you already know about them, or know
where to find out.  (MSM)

                            Section I
                          INTRODUCTION

What a great conference this has been!  We began with a vision of
the future from Charles Goldfarb,(1) and since then have had a
detailed tour of a lot that is going on in the present.  I want
to turn your attention forward again, and outward, back to the
fringes and the edges of our current knowledge.  We've been hearing
about a lot of projects in which the gains of SGML are being
consolidated, implemented, put into practice.  I want to talk
about some areas in which I think there may still be gains to be
made.

Not surprisingly, some of those gains are at the periphery of our
current concerns with SGML, in fringe applications, pushing the
edge of the envelope.  Not surprisingly, Yuri asked an academic
to talk about them, because academics are by nature fringe
people, and our business is to meddle with things that are
already pretty good, and try to make them better.

In identifying some areas as promising new results, and inviting
more work, there is always the danger of shifting from "inviting
more work" to "needing more work" and giving the impression of
dissatisfaction with the work that has been accomplished.  I want
to avoid giving that impression, because it is not true, so I
want to make very clear:  the questions I am posing are not
criticisms of SGML.  On the contrary, they are its children:
without ISO 8879, these questions would be very much harder to
pose:  harder to conceive of, and almost impossible to formulate
intelligibly.  SGML, that is, has created the environment
within which these problems can be posed for the first time, and
I think part of its accomplishment is that by solving one set of
problems, it has exposed a whole new set of problems.  Notation
is a tool of thought, and one of my main concerns is to find ways
in which markup languages can improve our thought by making it
easier to find formulations for thoughts we could not otherwise
easily have.

I start with the simple question:  what will happen to SGML and
to electronic markup in the future?  Charles Goldfarb told us
Monday:  the future of SGML is HyTime.  And this is true.  HyTime
is certain to touch most of us and affect our use of SGML in the
coming years.  But HyTime is already an international standard:
it's part of the present.  What will happen next?  What should
happen next?

What I will offer is just my personal view, it has no official
standing and should be taken for what it's worth.  It's an
attempt to provide a slightly fractured view, a view slightly
distorted in order to provoke disagreement and, I hope, some
useful thought.

If you want to know what is going to happen with SGML and markup
languages in the next few years, all you have to do is think
about what happened in programming languages after the introduction
of Cobol or Algol, and what happened in database management
systems after the development of the Codasyl data model.

                           Section II
                      THE MODEL OF THE PAST

The introduction of Cobol made possible vast improvements in
programmer productivity and made thousands of people familiar
with the notion of abstraction from the concrete machine.  It is
no accident that SGML is often compared to Cobol:  it is having a
similarly revolutionary effect.

More suggestive to me however is the parallel between SGML and
Algol. Apart from the skill with which its designers chose their
basic concepts, one of Algol's most important contributions was
its clean, simply designed syntax.  By its definition of its
syntax, Algol made possible the formal validation of the program
text, and thus rendered whole classes of programmer error (mostly
typos) mechanically detectable, thus effectively eliminating them
from debugging problems.  Similarly SGML renders whole classes of
markup error and data entry mechanically detectable and thus
eliminates them as serious problems.  The notion of formal
validity is tremendously important.

What happened after the introduction of Algol?  Over a period of
time, the intense interest in parsing and parser construction
gave way to interest in the meaning of programs, and work on
proofs of their correctness -- which I interpret as essentially
an attempt to extend formal validation beyond the syntax of the
language and allow it to detect logic or semantic errors as
well, and thus eliminate further classes of programmer error by
making them mechanically visible.  Formal reasoning about objects
requires a clean formal specification of those objects and their
characteristics, so time brought serious work on the formal
specification of programming language semantics.

In particular, work on type systems occupied a great deal of
attention, because Algol had demonstrated that type errors can
be mechanically detected for simple types.  So a lot of people
worked on extending those simple types and creating stronger,
subtler, more flexible, more useful type schemes; from this work
our current trend of object-oriented programming takes some of
its strength.

All of these same issues arose in connection with database
management systems after Codasyl.  (No matter what Tim Bray said
yesterday, this did happen well before 1978.)  The work of
Codasyl in defining formally the characteristics of databases led
to a generation of improved database systems, and eventually
the increasing complexity of those systems led to the introduc-
tion of the relational model, whose simple concepts had a clean
formal model firmly grounded in mathematics, which simplified
reasoning about databases and their correctness and which led to
substantial progress in database work, as Tim Bray described
yesterday.(2)

The database work confirmed, fueled, and strengthened the convic-
tion that formal validity and a rational set of data types are a
useful investment.  Equally important for our purposes, database
work showed the importance of placing as much as possible of the
burden of validation in the database schema definition and not
in the application software that works with the data.  If you
have a logical constraint in your data, for example that the sum
of columns DIRECT COST and INDIRECT COST not exceed the column
PRICE TO CUSTOMER, or that the only colors you offer are GREEN
and BLUE, it is better to define that constraint into the data-
base schema, so it will be consistently enforced by the db ser-
ver.  You may be tempted to leave it out of the schema on the
grounds that your application programs can enforce this con-
straint just as well as the server.  And you are right -- in
theory.  In practice, as surely as the day is long, before the
end of the year you and the two other people who were there are
transferred to new duties, your replacements will overlook the
note in the documentation, the first thing they will do is write
a new application which does not enforce this rule, and before
another year is gone your database will be full of dirty data.

In other words, to paraphrase an old Chicago election adage, con-
strain your data early and often.

As hardware costs declined and programmer costs increased,
portability became an increasingly prominent issue, and the
semantic specification of languages, abstracting away from the
specifics of individual machines, proved to be an invaluable tool
in helping achieve it where possible and limit the costs where
device-specific code was necessary.

Since the progress on formal semantics, though promising, did not
yield mechanistic logic checkers as reliable as mechanistic
syntax checkers, the years after Algol and Codasyl also saw the
development of the notion of programming style, and attempts to
define what constitutes a good program.  At least some of these
discussions appealed as much to aesthetic judgments as to empiri-
cal measures, which is as it should be, since aesthetics is a
fairly reliable measure of formal simplicity and power.

                           Section III
                  SGML PROBLEMS AND CHALLENGES

All of these problems are also visible in SGML and electronic
text markup today, and my prediction, for what it is worth, is
that they will occupy us a fair amount in the coming years.  What
is more, as you will have noticed in the course of the con-
ference, they are already occupying us now.  That is, the future
is closer than you might think.

What problems will occupy us in this uncomfortably near future?
The same ones that we saw in programming languages and database
management:

        * style
        * portability
        * a large complex problem I'll call "semantics", which
          includes problems of validation and type checking

III.1   Style

We saw the other day in Tommie Usdin's session One DTD Five
Ways(3) how close we already are to developing consensus on DTD
style.  As for external, presentational details, Tommie remarked
that there is already an implicit consensus.  For details of con-
struction and approach, she remarked, rightly I think, that there
is no one answer, no context-free notion of "a good DTD".  Our
work in coming years is to work on clarifying a context-
sensitive notion of "a good DTD".

When is it better to tag a profile of Miles Davis as a
<NewYorkerProfile> and when is it better to tag it <article> or
even <div>?  The answer is not, I suggest to you, as some were
proposing the other day: namely that it's always better to tag it
<NewYorkerProfile>, but you may not always be able to afford it
and so you may have to settle for <article> or <section>.  For
production of the New Yorker, or for a retrieval system built
specifically around the New Yorker, I personally would cer-
tainly use the more specific tag.  For a production system to be
used by all Conde Nast or Newhouse magazines, however, I think
the elements <goings>, <TalkOfTheTown>, and so on would be prob-
lematic.  Let's face it, Psychology Today and Field and Stream
just do not have those as regular departments.  In building a
100-million word corpus of modern American English, it would
similarly be a needless complication of the needed retrieval to
provide specialized tags for each magazine and newspaper
included in the corpus.  One of the points of this whole exercise
(i.e.  SGML) is to reduce irrelevant variation in our data -- and
relevance is context-dependent.

Judging by the talks we have heard, those in this community will
be building and customizing an awful lot of distinct DTDs in the
coming years.  One of our major challenges is to learn, and then
to teach each other, what constitutes good style in them:  what
makes a DTD maintainable, clear, useful.

III.2   Portability

Our second major challenge is, I think, portability.  I can hear
you asking "What?!  SGML is portable.  That's why we are all
here." And you are right.  Certainly, if SGML did not offer bet-
ter portability of data than any of the alternatives, I for one
would not be here.

But if data portability is good, application portability is bet-
ter. If we are to make good on the promises we have made on
behalf of SGML to our superiors, our users, and our colleagues,
about how helpful SGML can be to them, we need application
portability.  And for application portability, alas, so far
SGML and the world around it provide very little help.

Application portability is achieved if you can move an applica-
tion from one platform to another and have it process the data in
"the same way".  A crucial first step in this process is to
define what that way is, so that the claim that Platform X and
Platform Y do the same thing can be discussed and tested.  But
SGML provides no mechanism for defining processing semantics,
so we have no vocabulary for doing so.

DSSSL (ISO 10179, the Document Semantics and Style Specification
Language) does provide at least the beginnings of that
vocabulary.  So DSSSL will definitely be a major concern in our
future.  We have seen another bit of the future, and it is DSSSL.

                           Section IV
                         THE BIG PROBLEM

But the biggest problem we face, I think, is that we need a clear
formulation of a formal model for SGML.  If we get such a formal
model, we will be able to improve the strength of SGML in several
ways.

IV.1   SGML's Strengths

SGML does provide a good, clean informal model of document struc-
ture. Like all good qualitative laws, it provides a framework
within which to address and solve a whole host of otherwise
insoluble problems.

For the record, my personal list of the crucial SGML ideas is:
        * explicitly marked or explicitly determinable boundaries
          of all text elements
        * hierarchical arrangement/nesting of text elements
        * type definitions constraining the legal contents of 
          elements
        * provision, through CONCUR and the ID/IDREF mechanism, for
          asynchronous spanning text features which do not nest properly
          -- and here I want to issue a plea to the software vendors:  
          Make my life easier.     Support CONCUR!
        * use of entity references to ensure device independence of
          character sets

Obviously there are a number of other features important to
making SGML a practical system, which I haven't listed here.
What I've listed are what seem to me the crucial elements in the
logical model provided by SGML.

It seems to me that a properly defined subset of SGML focusing on
these ideas and ruthlessly eliminating everything else, could go
far in helping spread the use of SGML in the technical community,
which is frequently a bit put off by the complexity of the
syntax specification.  I don't think a subset would pose any
serious threat to the standard itself:  use of a subset in prac-
tice leads to realizations of why features were added to the
standard in the first place, and with a subset, the growth path
to full use of the standard language is clearly given. Spreading
the use of SGML among the technical community would in turn help
ensure that we get the help we will need in addressing some of
the challenges we face.

IV.2   Semantics

We commonly think of SGML documents as data objects, to be
processed by our programs.  I ask you now to participate for a
moment in a thought experiment:  what would the world be like, if
our SGML documents were not documents, but programs?  Our current
programs for processing SGML documents would be compilers or
interpreters for executing SGML programs.

What else?

Well, first of all, we discover a tremendous gap:  we have lost
everything we used to know about programming language semantics,
and we have no serious way of talking about the meanings of these
SGML programs.  And for that matter, we have no serious way of
talking about what happens when we compile or execute them.  In
other words, we have made our programs reusable (we can run the
same program / document with different compilers) and so we can
use just one programming language instead of many, and this is
good, but it would be nice to have a clue about the semantics of
the interpretations our compilers make of the language we are
using.

The clearest analogy I can think of to our situation is that in
SGML we are using a language like Prolog, in which each program
(document) has both a declarative interpretation and an impera-
tive or procedural interpretation.  If you ignore the procedural
aspects of Prolog programs, you can reason about them as
declarative structures; if you attend to the procedural aspects,
you can see what is going to happen when you run the program.

The difference between Prolog and SGML is that Prolog has very
straightforward semantics for both the declarative and the proce-
dural interpretations, for which formal specifications are pos-
sible.  In SGML, we have a very clear informal idea of the
declarative meaning of the document, but not a very formal one.
And we have no vocabulary except natural languages for talking
about processing them.

Ironically, it is not easy to say exactly what ought to be meant
by the term semantics.  Different people use it in different
ways, and if it does have a specific, consistently used meaning
in formal language studies, then the practitioners have kept it a
pretty well guarded secret.  So I can't tell you what semantics
means; I can only tell you what I mean by it today.

Imagine I am about to send you an SGML document.  Included in
this document are two elements I suspect you may not have
encountered before: <blort> and <vuggy>.  When I say I'd like to
have a good specification of their semantics, I mean I would like
to be able to tell you, in a useful way, what <blort> and <vuggy>
mean, and what formal constraints are implied by that meaning.

But we don't seem to know how to do that.

The prose documentation, if there is any and if I remember to
send it, may say what a <blort> is, or it may not.  It may tell
you what <vuggy> means, but if it does it may say only "full of
vugs; the attribute TRUE takes the values YES, NO, or
UNDETERMINED".  Unless you are a geologist you probably don't
know what a vug is, and if you are a geologist you may harbor
some justifiable skepticism as to whether I know and am using the
term correctly.

Even if my prose documentation does explain that a vug is an air-
hole in volcanic rock, and you know how to decide how many vugs
make a rock vuggy, I have probably not succeeded in specifying
what follows logically from that meaning in any useful way --
probably not, that is, in a way that a human reader will
understand and almost certainly not in a way that a validating
application can understand and act upon.  For example, how many
people here realize, given our definition of <vuggy>, that the
tag <vuggy true=yes> is incompatible with the tag <rock
type=metamorphic> -- since the definition of a vug is that it's
an airhole in volcanic, i.e. igneous, rock.  If you noticed that,
congratulations.  Are you right?  I don't know:  if some vuggy
igneous rock is metamorphosed and the airholes are still there,
is it still vuggy?  I don't know:  I'm not a geologist, I'm just
a programmer.  Is there a geologist in the house?(4)

It would be nice to be able to infer, from the formal definition
of <vuggy>, whether or not <vuggy true=yes> is incompatible with
<rock type=metamorphic>, just as we can infer, from the DTD, that
<vuggy true='76.93%'> is not valid, since the attribute true can
only take the values YES, NO, and UNDETERMINED.  Prose is not a
guaranteed way of being able to do that.

So what can we manage to do by way of specifying the "semantics"
of <blort> and <vuggy>?  We don't seem to know how to specify
meaning in any completely satisfactory way.  What do we know how
to do?  Well, we can fake it.  Or to put it in a more positive
light, we can attempt to get closer to a satisfactory specifica-
tion of meaning in several ways:

IV.2.1   Prose Specification

First, we can attempt to specify the meaning in prose.

Specifications in prose are of course what most of our manuals
provide in practice.  It is handy to formalize this as far as
possible, to ensure consistent documentation of all the charac-
teristics of the markup that we are documenting.  We've heard
obliquely about a number of systems people use to generate
structured documentation of SGML tag sets: Yuri Rubinsky men-
tioned one used internally by SoftQuad; Debby Lapeyre mentioned
one; the Text Encoding Initiative (TEI) uses one; I am sure
others exist too.  This is already a live issue.  And it will
continue to occupy our attention in the coming years.

Natural-language prose is, at present, the only method I know of
for specifying "what something means" in a way that is intuitive
to human readers.  Until our colleagues in artificial
intelligence make more progress, however, prose specifications
cannot be processed automatically in useful ways.

IV.2.2   Synonymy

Second, we can define synonymic relationships, which specify that
if one synonym is substituted for another, the meaning of the
element, whatever that meaning is, remains unchanged.  If we
didn't know in advance what <blort> and <farble type=green>
meant, we probably still don't know after being told they are
synonyms.  But knowing we can substitute one for the other
while retaining the meaning unchanged is nevertheless comfort-
ing.

IV.2.3   Class Relationships

Third, we can define class relationships, with inheritance of
class properties.  It doesn't tell us everything we might need to
know, but if we know that a <blort> is a kind of glossary list,
or a kind of marginal note, we would have some useful informa-
tion, which among other things would allow us to specify fall-
back processing rules for applications which haven't heard of
<blort>s but do know how to process marginal notes.

The fact that HyTime found it useful to invent the notion of
architectural form, the fact that the TEI has found it useful
to invent a simple class system for inheritance of attribute
definitions and content-model properties, both suggest that a
class-based inheritance mechanism is an important topic of fur-
ther work.

IV.2.4   License or Forbid Operations

Fourth, we can define rules that license or forbid particular
relations upon particular objects or types of objects.  We may
not know what a <blort> is, but we can know that it stands in
relation X to the element <granfalloon>, and we can know that
no <blort> can ever stand in relation Y to any element of type
<vuggy>.

In addition to relations, we can specify what operations can be
applied to something:  knowing that INTEGER objects can be added
while DATE objects cannot, especially if one of the DATE objects
is "in the reign of Nero", is part of what we mean when we say we
understand integers and dates.

An ability to define legal operations for SGML objects is a key
requirement for using SGML in data modeling.

The definition of a data type involves both the specification of
the domain of values it can take on and the spec of operations
which can apply to it.  Because SGML has no procedural vocabulary
it is very difficult to imagine how to specify, in SGML, the
operations applicable to a data type.  It would be useful to
explore some methods of formal specification for legal opera-
tions upon SGML objects.

But note that "what it can do" and "what can be done to it" are
not, really, specifications of "what it means".

Moreover, object-oriented specifications cannot be exhaustive.
In an application program, if an operation P is not defined for
objects of type Q, it counts as a claim that operation P is
illegal for such objects.  Even if it's not illegal, you aren't
going to get anywhere by trying to call it, so it might as well
be illegal.  In SGML, with our commitment to application inde-
pendence, that isn't the case.  If no definition of addition
for DATE objects is provided, that could mean that it is semanti-
cally invalid:  dates can never be added.  Or it could mean that
we just haven't got around to it yet, or haven't thought about it
yet.  So the absence of a method for performing an operation
doesn't tell us whether the operation is or should be legal upon
a particular type of object.  Obviously, instead of leaving
operations undefined, we could specify explicitly that certain
operations are illegal for objects of a certain class.  But it is
not feasible to make an list of all the things that cannot be
done to DATES, or BLORTS, or GRANFALLOONS, because the list is
likely to be infinite.

Nevertheless, as a way of approaching the formal description of
applications, object oriented work is very promising.  It's
fairly obvious that in the future we need to work together with
those people developing the object-oriented programming
paradigm.

IV.2.5   Axiomatic Semantics

Fifth, we can specify in some logical notation what claims about
the universe of our document we can make, given that it is marked
up in a certain way, and we can define what inferences can be
made from those claims.  The synonymic relations I was talking
about a moment ago are just a special case of this.

Formal logic (i.e. first-order predicate calculus) certainly
makes possible the kinds of inference I've been talking about,
but even predicate calculus makes some concessions to the dif-
ficulty of the problem. I can infer that this value for this
attribute and that value for the other one are consistent,
inconsistent, etc.  But since Frege and Russell and Whitehead,
logic has treated itself as a purely formal game divorced from
meaning; the only relation to the real world is by way of models
which involve assigning meanings to the entities of the logical
system and seeing which sentences of the logical system are true
under these interpretations.  The problem is that "assign a mean-
ing in the real world to an entity or operation of the logical
system" is taken as a primitive operation and thus effectively
undefined.  We all know how to do this, right?  We can't define
semantics, but we know it when we see it.

In work on declarative semantics, we can learn a lot from recent
experience with logic constraint programming and declarative
programming.  The declarative approach to SGML semantics has a
certain appeal, both because it fits so well with the perceived
declarative nature of SGML as it is, and because declarative
information is useful.  As John McCarthy said in his Turing Award
lecture, "The advantage of declarative information is one of
generality.  The fact that when two objects collide they make a
noise can be used in a particular situation to make a noise, to
avoid making a noise, to explain a noise, or to explain the
absence of noise.  (I guess those cars didn't collide, because
while I heard the squeal of brakes, I didn't hear a crash.)"

One worry about declarative semantics is that it might prove dif-
ficult to define processing procedures in a declarative way.  But
in fact it is possible to specify procedures declaratively as
Prolog, and logic constraint languages, and the specification
language Z show us.

So I think a formal, axiomatic approach of some kind is very
promising.

But let's be real:  it is very unlikely from a description of the
tag set in first-order predicate calculus that you or I, let
alone the authors we are working with, will understand what a
<blort> is, or even what a <vug> is.

IV.2.6   Reduction Semantics

Finally, I should mention one further method of formal semantic
specification:  reduction semantics.  Reduction works the way
high-school algebra works.  One expression (e.g. "(1 + 2) + 5" is
semantically equivalent to that expression ("3 + 5"), that one to
this other one("8"), and so on.  If you work consistently toward
simpler expressions, you can solve for the value of X. There
has been substantial work done on reduction semantics in program-
ming languages, including LISP and more purely functional lan-
guages like ML.

Moreover, reduction semantics doesn't have to be defined in terms
of string expressions:  it is entirely possible to define reduc-
tion semantics in terms of trees and operations upon trees.

Take a simple example:  if we have an element <A> whose content
model is "B+", does the order of <B>s matter?  In SGML there is
no way of saying yes or no.  Reduction semantics allows you to
say that this tree (gesture)

     <a><b>Apples ... </b><b>Oranges ... </b></a>

is the same as that tree (gesture)

     <a><b>Oranges ... </b><b>Apples ... </b></a>

so sequence is not important.  Or that they are not the same, so
sequence is significant.

We have a good example of this type of work in the paper "Mind
Your Grammar" and the
grammar-based DB work at the university of Waterloo by Frank
Tompa and Gaston Gonnet.(5)  I think this is a very important
field for further work.

In summary:  we have at least six areas to explore in trying to
work on better semantic specification for SGML:  structured docu-
mentation (the kind of thing SGML itself is good at), synonymy,
classes, operation definitions, axiomatic semantics, and reduc-
tion semantics.

I don't know whether these activities would constitute the
specification of a semantics for SGML and for our applications,
or only a substitute for such a specification, in the face of
the fact that we don't really know how to say what things mean.
Certainly no lexicographer, no historical linguist, would feel
they constituted an adequate account of the meaning of anything.
And yet I suspect that these activities all represent promising
fields of activity.

IV.3   Validation and Integrity Checking

A formal model would make it possible to formulate cleanly many
of the kinds of constraints not presently expressible in SGML.
This is by no means an exhaustive or even a systematic list, but
at least all the problems are real:

        * If an attribute SCREEN-TYPE has the value BLACK-AND-
          WHITE, the attribute COLOR-METHOD almost certainly should 
          have the value DOES-NOT-APPLY.

         But this kind of constraint on sets of attribute values
         is impossible to specify for SGML attributes.  It would certainly 
         be useful sometimes to be able to define co
         occurrence constraints between attribute values.
        
        * Similarly, there are cases where one would like to con-
          strain element content in a way I don't know how to do with 
          content models.  We have heard repeatedly in this
          conference about revision and version control systems
          which allow multiple versions of a document to be encoded in 
          a single SGML document. For example, one might
          have a <revision> element which contains a series of
          <version> elements.   The TEI defines just such a tag pair.  
          At the moment our <version> element can contain only
          character and phrase elements.  It would be nice to allow
          it to operate as well upon the kind of SGML-element-based deltas
          that Diane Kennedy described the other day
          for revision info, in which the unit of a revision was
          always an SGML element. If a change is made within a paragraph, 
          the entire paragraph is treated as having been
          changed, and versioning consists in choosing the right
          copy of the paragraph.(6)  But one would like to be able to 
          specify that if the first <version> element contains a <p>
          element, the second had better contain one as well, and
          not a whole new subsection or just a phrase.  Otherwise, the 
          SGML document produced as output from a version
          -selection processor would not be parsable.

        * It would be nice to be able to require that an element be
          a valid Gregorian date, or a valid ISO date, or a valid part 
          number, etc., etc.
        * It would be nice to be able to require character data to
          appear within a required element:  i.e. to have a variant on 
          PCDATA whose meaning is "character+" and not
          "character*" -- or even to require a minimum length, as
          for social security numbers, phone numbers, or zip codes.
        * The SGML IDREF is frequently used as a generic pointer.
          Many people wish they could do in SGML what we can do in 
          programming languages, and require a given
          pointers to point at a particular type of object.  (The
          pointer in a <figref> had better be pointing at a figure, 
          or the formatter is going to be very unhappy.)
        * Similarly, it would be nice to have a type system that
          understood classes and subclasses.  The only reason we face 
          this nasty choice between using the tag
          <goings> and using the tag <section> for the New Yorker's
          "Goings on Around Town" section is that we have no way to 
          make a processor understand that <goings>
          and <TalkOfTheTown> and so on are just specialized
          versions of <section> or <article>.  If we use the specialized 
          tags, and want to specify an operation upon all
          sections of all magazines, we must make an exhaustive
          list of all the element types which are specializations of 
          <section>.  To be sure, our application systems can
          handle this.  But we want to constrain early and often.
          And never constrain in your application what you could 
          constrain in the DTD.

                            Section V
                    CONCLUSION:  WHY BOTHER?
I suppose you can sum up my entire talk today this way.  We want
to constrain our data early and often.  To do this, we need bet-
ter validation methods.  To express the validation we need, we
need a clean formal model and a vocabulary for expressing it.
The query languages described yesterday are not the final word
but they are a crucial first step.

Why do we want to do all these things?  Why bother with formal
specification?  Because formal specification and formal vali-
dation are SGML's great strengths.

Why is it, as Charles Goldfarb said on Monday, that SGML allows
us to define better solutions than the ad hoc solutions built
around a specific technology?  It is because SGML provides a
logical view of problems, not an ad hoc view based on a specific
technology.  Naturally, it seems to suit the technology less well
than the ad hoc approach.  But when the underlying technology
changes, ad hoc solutions begin to fit less well, and look less
like ad hoc solutions, and more like odd hack solutions.(7) But
we can improve SGML's ability to specify the logical level of our
data and our applications.  And so we should.

A logical view is better than a technology-specific view.  And so
we should welcome every effort to improve the tools available to
use in defining our logical view.  In this connection I could
mention again the work by Gonnet and Tompa on large textual data-
bases, and the work of Anne Brggemann-Klein which is occasionally
reported on the Netnews forum comp.text.sgml.

Success in improving our logical view of the data is what will
enable the quiet revolution called SGML to succeed.

And now I hope you'll join me in thanking Yuri Rubinsky, for
organizing this conference and for allowing all of us co-
conspirators in the revolution to get together and plot.

      -----------------------------------------------------

(1)  Charles Goldfarb, "I Have Seen the Future of SGML, and It Is
..."     (Keynote, SGML '92), 26 October 1992.
(2) Tim Bray, "SGML as Foundation for a Post-Relational Database
Model,"  talk at SGML '92, 28 October 1992.
(3) B. Tommie Usdin (et al.), "One Doc -- Five Ways:  A Compara-
tive DTD  Session," (panel discussion of five sample DTDs for
the New Yorker magazine), SGML '92, 27 October 1992.
(4) There was; metamorphic rock can be vuggy too, so the initial
definition was too narrow.  - MSM
(5)  Gaston H. Gonnet and Frank Wm. Tompa, "Mind Your Grammar:  a
New Approach to Modelling Text," in Proceedings of the 13th
Very Large Data Base Conference, Brighton, 1987.
(6)  Diane Kennedy, "The Air Transport Association / Aerospace
Industries Association, Rev 100", talk at SGML '92, 28
October 1992.
(7) I owe this pun to John Schulien of UIC.