Bibliography Markup (SGML)

Bibliography Markup (SGML): Brief Justification

The bibliography sections in this SGML database use an experimental form of SGML markup designed with two goals in mind: (1) it should be compatible with the current [1994/1995] generation of HTML 1.0 and HTML 2.0 browsers, and (2) it should provide an encoding richer than HTML so that SGML applications which actually understand document structure can make use of the encoded information in processing. Using Dynatext, Explorer, Panorama (beta), or other SGML applications which support query languages, it is possible to search on various bibliographic text objects (dates, personal names, functions) based upon their names and syntactic relationships. The added value will become more apparent when the bibliographic database is larger and tagged for indexing.

I have tested the encoded data using several HTML browsers: Mosaic, Netscape, Lynx, and an early version of Arena. The markup is designed to be compatible with these browsers insofar as they exhibit the common (and commonsense) behavior of simply ignoring SGML markup notations (i.e., tags and attributes) that are not declared meaningful to them in the HTML specifications.

This commonly-attested HTML agent behavior accords precisely with sentiments expressed in the HTML 2.0 specification:

"2.2 Undefined Tag and Attribute Names. An accepted networking principle is to be conservative in that which one produces, and liberal in that which one accepts. HTML user agents should be liberal except when verifying code. HTML generators should generate strictly conforming HTML. The behavior of HTML user agents reading HTML documents and discovering tag or attribute names which they do not understand should be to behave as though, in the case of a tag, the whole tag had not been there but its content had, or in the case of an attribute, that the attribute had not been present." -- See: HyperText Markup Language Specification - 2.0 (INTERNET DRAFT: file draft-ietf-html-spec-01.txt), February 8, 1995.

Please bear in mind that the markup is not meant to imply a proposed extension of HTML, or to subvert HTML. Nor is it meant to imply that following a DTD is irrelevant: I have used my own. It is simply meant to exploit a very reasonable (and authorized) HTML browser behavior in such a way as to allow HTML browsers to display an SGML document.

I will publish the DTD which was used in the preparation of the data - somewhere near here - as soon as I finish fiddling with a few minor cosmetic points. For now, anyone interested in the DTD can easily transduce it from the data. You'll notice that there was an attempt to keep punctuation out of the "main" text objects so that the data can be moved into a better SGML database sometime in the future. Many concessions to HTML (and its lack of stylesheets) had to be made in the markup, but I will not document the details here.