SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY

SEARCH
Advanced Search

Last modified: June 18, 2003

Library

SGML/XML Special Topics

SGML Special Topics: Contents

Grammar
SGML Declaration
Productions: the grammar productions from ISO 8879
Definitions (from clause 4 of ISO 8879)
Michael Sperberg-McQueen's Grammar Tools
SGML/XML DTD/Grammar Transduction and Generation
Content Model Simplification, Rewrite Rules, Reducing Grammars
SGML/XML Notion of Ambiguity (non-deterministic content models)
RS/RE Processing
Use (and Non-use) of Exceptions in DTDs
CDATA [and RCDATA] as Declared Content
Elements versus attributes - How do I decide?
Duplicate Tokens (forbidden) in an Attribute Definition List
"Are entity references recognized and replaced in attribute values?"
SGML and Context Free Grammars
SGML/XML and Forest/Hedge Automata Theory
Markup Languages and (Non-) Hierarchies
SGML SUBDOC Feature
Other Grammar/Parsing Issues and SGML FEATURES
Architectural Forms and SGML/XML Architectures
Groves, Grove Plans, and Property Sets in SGML/XML/DSSSL/HyTime
ESIS - ISO 8879 Element Structure Information Set
ISO 8879: Character Sets and Multilingual Text, including Extended Reference Concrete Syntaxes (ERCS)
ISO 8879: Conformance
ISO 8879: DTDS (Repositories of document type declarations/definitions conforming to ISO 8879)
SGML/XML Entity Sets
SGML Entity Types, and Entity Management
Catalogs, Formal Public Identifiers, Formal System Identifiers
ISO 8879: Revisions/Amendments
Technical Resolutions and Proposals
SGML (and HTML) Stylesheets
SGML and HTML
SGML/XML and TeX
SGML/XML and Math

SGML: Special Topics

Grammar

Several prominent grammar issues are referenced here, but the best access to expert commentary on arbitrary topics or clauses in the Standard (on the Internet) is through searching the archives of the Usenet News group comp.text.sgml. The technical commentary of Erik Naggum on ISO 8879:1986 in the archives of comp.text.sgml, in terms high quality and sheer mass, is especially valuable.

SGML Declaration

[CR: 19990622]

The SGML declaration establishes the "lexical" basis for an SGML document, including the character sets, markup delimiters, features, and other options. It governs both the DTD (document type definition) and the document instance. A few pointers to resources follow.

A copy of the full SGML declaration is available from the SGML Repository, contributed by Erik Naggum. It supplies the default values assumed by an SGML parser, thus clarifying what features and parameters are governed by the SGML declaration. [local archive copy]
"A Template of the SGML Declaration", from OmniMark Technologies; see immediately below; [mirror copy, text only]
An extremely useful (and free!) booklet explaining the SGML declaration is available from OmniMark [formerly Exoterica] Corporation. Title: Understanding the SGML Declaration. The document is available online in HTML format from the OmniMark WWW server: Understanding The SGML Declaration. It has several useful annexes. See also the main bibliography for details. To obtain a copy, send an email request with postal address to info@omnimark.com.info.
Wayne L. Wohler contributed a three-part serialized article on the SGML declaration in <TAG>'s occasional tutorial series. The tutorial explains the purpose and use of each part of the SGML declaration. The full text of this tutorial is available online. See also the bibliography entries for Part 1, Part 2, and Part 3.
"Document Character Sets by Example," by Tony Graham, Consultant, Mulberry Technologies, Inc. From the SGML '96 presentation. Available online in HTML format. The bibliographic entry for the presentation supplies abstract and other details.
"Default SGML declaration, and handling of the SGML declaration in [James Clark's] SP" version 1.1.2 (February 1997), including Extended Naming Rules as specified in Annex J of ISO 8879:1986 (added by the 1996 technical corrigendum.
[*See also preceding item] NSGMLS 1.1.1 SGML Declaration (implemented by James Clark); see the SP database entry
SGML decl for Japanese, from SP 1.1.2; viz., an SGML declaration with "a character set declaration is suitable for use with the ujis or sjis coding systems"
The document "Comparison of SGML and XML" by James Clark provides an SGML Declaration for XML. This SGML Declaration for XML is in two variants: (a) one which "takes advantage of the Extended Naming Rules Technical Corrigendum to ISO 8879, but does not make use of the Web SGML Adaptations Annex"; (b) one which "takes advantage of the Web SGML Adaptations Annex to ISO 8879".
[Now obsolete; see the preceding item.] SGML Declaration for XML [valid as of about April 2, 1997], from James Clark's SP 1.1.3
SGML Declaration for Basic SGML, posted by Paul Grosso (i.e., from Clause 15)
TEI SGML declaration, full ASCII character set
AECMA 1000D Change 7 Document Type Definition - SGML Declaration [local archive copy]
TEI SGML declaration for local processing with SGMLS, EBCDIC CHARSET Code Page 1047
Example SGML Declaration, for EBCDIC (Wayne Wohler, HTML), or: text version
SGML Declaration using ERCS, from Rick Jelliffe (ricko@allette.com.au); see now http://www.allette.com.au/sgml/ercs/simple.html
Creating a DTD and an SGML Declaration [IBM Publishing Systems. SGML Translator: Creating a DTD and an SGML Declaration. Document Number SC34-5075-01, Release 3] A tutorial resource for creating DTDs and SGML declarations.

Productions: the grammar productions from ISO 8879

[CR: 19970924]

A list of productions is available in a text file from the SGML Repository, created by Erik Naggum. [mirror copy]
SGML Syntax Summary Index Version 1.5.1 (96/11/15), by Harvey Bingham. See the description in the announcement for the revision of May 18, 1996. This enhanced SGML Syntax Summary is an immensely useful tool providing indexed and linked access to SGML grammar productions. Separate listings are given for: (1) SGML Syntactic Variables; (2) SGML Keyword Syntactic Literals; (3) SGML Terminal Variables; (4) SGML Terminal Constants; (5) SGML Reference Delimiter Roles. The document will assist in the "study [of] the syntax of ISO 8879-1986 Standard Generalized Markup Language, aided by hypertext links for the syntax productions, their names, objects in their definitions, where used and where defined, and cross-references to containing clause and page:line pairs in 'The SGML Handbook', by Charles Goldfarb." [See also DSSSL Syntax Summary, May 1996 or later]
Mirror copy of the SGML Syntax Summary on the SGML/XML Web Page, Version 1.5, November 1996 [authorized]
Lennart Staflin provided a database of SGML syntax productions [Or: INDEX: http://gopher.lysator.liu.se:70/.html/sgmlsyntax/index.html. "The SGML syntax description provided by TEI contains syntax productions that have been mechanically translated to HTML." The entry points are SGML Document; Search the syntax productions; Element; Start-tag; Document type declaration; Element declaration; Index. Author email address: lenst@lysator.liu.se.
"Productions from ISO 8879:1986 - SGML", from Christopher R. Maden (O'Reilly). "An all-out hyperlinked version of the formal productions from ISO 8879 (SGML)." Based upon the text of the Standard and the (1996) TC for Extended Naming Rules for SGML. (See also Maden's Formal Syntax from HyTime, hyperlinked version.)
Hypertext (linked) version of the productions, from W3; [mirror copy]
ISO/IEC 8879:1988 productions (clauses 6-15), sorted by production term. "For each term, a reference to the ISO production itself, the production number, the corresponding unit name, and the unit description is added. The productions, in turn, give access to the definitions of terms as listed in clause 4 of the standard."
Grammar Productions EBNF from Bob Agnew (November 1995)
See also the grammar materials of Michael Sperberg-McQueen below.

Definitions from clause 4 of ISO 8879

[CR: 19981016]

For the benefit of those who temporarily have been separated from their personal annotated copy of the Standard, and need precise definitions: list ISO 8879 Definitions, from Clause 4 - hypertext overview. "All definitions listed as found in clause 4. . . all defined terms used in the definition are listed below the definition text." [See the parent node for related links.] Based upon the work of Arjan Loeffen.

A similar online SGML Glossary is also available from Pindar, based upon the work of Neil Bradley in his book The Concise SGML Companion. This resource is also available on an alternate Web site; see SGML Dictionary.

Michael Sperberg-McQueen's Grammar Tools

[CR: 19970529]

See the ftp directory ftp://ftp-tei.uic.edu/pub/tei/sgml/grammar/ and subdirectories: bison, carthage, dpp, iso8879.

Bison tools: "The subdirectory pub/tei/sgml/grammar/bison contains files with Bison grammars and Flex scanners for SGML document type definitions (p2bnf.y and p2bnflex.l), SGML document instances (p2doc.y and p2doclex.l), and SGML declarations (p2decl.y and p2declex.l). The grammars have been parsed with Bison to ensure that the grammars are clean, but the semantic actions do no useful work." [from the README file]
ISO 8879 grammar: The subdirectory pub/tei/sgml/grammar/iso8879 contains transcriptions of the full formal grammar of ISO 8879 (the SGML standard), either in numeric sequence of the productions in the standard (sgmlfull.syntax) or arranged in groups, so closely related productions are more readily found together, with dependency trees showing which productions depend on which others. [from the README file]
Carthage: "Carthage is a yacc/lex-based parser for SGML DTDs which can delete references to undeclared elements. It can also do a few other things, depending on the run-time flags you give it." Some options include: (1) dropping or keeping marked sections; (2) warning if entities are declared twice; (3) dropping or keeping parameter entity declarations; (4) deleting named GIs from content models; (5) listing of specified classes of elements in the DTD [used, unused, default undeclared, declared]; (6) dropping or keeping comments in the output file, etc. [extracts from the README file, dated June 17, 1996.
dpp (DTD pre-processor parser): "DPP is a parser for SGML document type declarations, intended for use as a front end for filters which modify DTDs (e.g. filters to expand all or some parameter entity references, or to rename elements, etc.). Since DPP uses the same output format as sgmls. . .many existing tools for writing filters for SGML document instance . . . can be used with DPP to make filters for DTDs" [from the doc]. See: FTP directory: ftp://ftp-tei.uic.edu/pub/tei/sgml/grammar/dpp, or: the tar-gzipped package.
FTP to the relevant directory at UIC: ftp://ftp-tei.uic.edu/pub/tei/sgml/grammar/
See the README file (mirror copy, December 1995)
The SGML grammar descriptions are also available via mail from the UICVM LISTSERVer and on other FTP other sites: (filenames: memo.syntax, etc.)

SGML/XML DTD/Grammar Transduction and Generation

Some papers on grammar generation and supporting software tools are referenced in a separate document, "SGML/XML DTD Transduction and Generation."

Content Model Simplification, Rewrite Rules, Reducing Grammars

SGML/XML Notion of Ambiguity (non-deterministic content models)

[CR: 20010619]

Several technical papers on ambiguity by Anne Brüggemann-Klein [and Derick Wood] are listed in the SGML bibliography. See for example: the dissertation of Anne Brüggemann-Klein, Formal Models in Document Processing (1993) and "One-Unambiguous Regular Languages" (1998).
A (somewhat) dated short list of published references
See Arjan Loeffen's CTS contribution above
Discussion on XML-DEV from June, 2001. Compares SGML/XML notion of deterministic model vs. that of RELAX NG.
Discussion from August 1997. Joe English: ". . . The real show-stopper though is probably start-tag omission. Without the [ISO 8879's particular] ambiguity restriction, the formal definition of 'contextually required element' and related terms are utterly meaningless. Specifically, the parts about 'satisfied tokens': in terms of general regular expressions it doesn't make any sense to talk about which subexpression an input token matches, only whether the input sequence as a whole matches the regular expression as a whole. . ."
[January 14, 2001] XML and non-deterministic ['ambiguous'] content models. Jim Shain asked a question about (non-)determinism in XML content models; informative responses from Richard Tobin, James Clark, Deborah Aleyne Lapeyre, and Marcus Carr were provided.
[January 14, 2001] XML parser response to non-deterministic content models. TAKAHASHI Hideo said "I understand that the XML 1.0 spec prohibits non-deterministic (or, ambiguous) content models (for compatibility, to be precise)" it this so?... Joe English and James Clark answer "no," and explain.
Note by Anne Buüggemann-Klein
Note by Erik Naggum
"SGML & XML Content Models." By Pekka Kilpeläinen. Report C-1998-12, Department of Computer Science, University of Helsinki, May 1998. 16 pages. URL: . "The SGML and XML standards use a variation of regular expressions called content models for modeling the markup structures of document elements. SGML content models may include so called and groups, which are excluded from XML. An and group, which is a sequence of subexpressions separated by an &-operator, denotes the sequential catenation of its subexpressions in any possible order. If one wants to shift from SGML to XML in document production, one has to translate SGML content models to corresponding XML content models. The allowed content models in both SGML and XML are restricted by a requirement of determinism, which means that a parser recognizing document element contents has to be able to decide without lookahead, which content model token to match with the current input token, while processing the document from left to right. It is known that not all SGML content models can be expressed as an equivalent XML content model. It is also known that transforming an SGML content model into an equivalent XML content model may cause an exponential growth in the length of the content model. We discuss methods of eliminating and groups and analyze the circumstances where they can be applied. We derive a tight bound of e n! on the number of symbols in the result of eliminating an and group of n symbols, where e = 2.71828... is the base of natural logarithms. We present the analysis in a pedagogical manner, emphasizing mathematical methods which are typical to the analysis of algorithms. We also show that minimal deterministic automata for recognizing an and group of n distinct element names contain 2ⁿ states and n 2^n-1 transitions, excluding the failure state and transitions leading to it..." [cache]
[January 14, 2001] "How to validate XML." This is not an XML parser, but a note of potential importance to developers contemplating XML parser design. From Joe English. "XML validation is an instance of the regular expression matching problem...The most commonly-used technique to solve this problem is based on finite automata. There is another algorithm, based on derivatives of regular expressions, which deserves to be more widely known..."
Content Model Algebra. By Sam Wilmott (OmniMark Technologies). "Anyone desiring to have a full understanding of SGML content models and the theory behind the construction of text markup languages should have some familiarity with the basic concepts of Automata Theory and Set Theory on which content models are based. Automata Theory is a branch of Computer Science to which some, but not all, Computer Science students are exposed. Set Theory is a branch of Mathematics to which grade-school students were exposed, at least in the days of "New Math". This report provides an outline of some of the concepts of Automata Theory and Set Theory relevant to creating text markup languages. A background in formal Computer Science theory and Mathematics is required for an easy comprehension of the material in the paper. Other readers may find it interesting that there is a strong theoretical basis for what they are doing when they write content models..." URLs: try perhaps http://home.chello.no/~mgrsby/sgmlintr/sgmlcont.htm, or local crippled version.
Use indexed archives for comp.text.sgml wherever you can find them: for example, Arjan Loeffen's indexed shadow archive. Many CTS discussions have been held on parsing in light of SGML's definition of "ambiguity"

RS/RE Processing

[CR: 19961118]

Summary from Joe English,(and Michael Sperberg-McQueen)
Post from Dave Peterson why/when an "ignored RE [is] to be counted against a PCDATA token in the content model"
WG8 N1875 - U.S. Contribution on SGML Review (11 November 1996), including RE processing [mirror copy]

Use (and Non-use) of Exceptions in DTDs

[CR: 20020309]

Joe English (August 18, 1997) - on why "it's not possible to create an exclusion-free DTD exactly equivalent" [to one having inclusion exceptions]
[March 09, 2002] "Complexity of Context-Free Grammars with Exceptions and the Inadequacy of Grammars as Models for XML and SGML." By Romeo Rizzi (Facoltà di Scienze, Dipartimento di Informatica e Telecomunicazioni, Università degli Studi di Trento). In Markup Languages: Theory & Practice 3/1 (Winter 2001), pages 107-116 (with 19 references). "The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow authors to better transmit the semantics in their documents by explicitly specifying the relevant structures in a document or class of documents by means of document type definitions (DTDs). Several authors have proposed to regard DTDs as extended context-free grammars expressed in a notation similar to extended Backus-Naur form. In addition, the SGML standard allows the semantics of content models (the right-hand side of productions) to be modified by exceptions. Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. Since XML does not allow exceptions, the problem of exception removal has received much interest recently. Motivated by this, Kilpeläinen and Wood have proved that exceptions do not increase the expressive power of extended context-free grammars and that for each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. Since their argument was based on an exponential simulation, they also conjectured that an exponential blow-up in the size of the grammar is a necessary devil when purging exceptions away. We prove their conjecture under the most realistic assumption that NP-complete problems do not admit non-uniform polynomial-time algorithms. Kilpeläinen and Wood also asked whether the parsing problem for extended context-free grammars with exceptions admits efficient algorithmic solution. We show the NP-completeness of the very basic problem: given a string w and a context-free grammar G (not even extended) with exclusion exceptions (no inclusion exceptions needed), decide whether w belongs to the language generated by G . Our results and arguments point up the limitations of using extended context-free grammars as a model of SGML, especially when one is interested in understanding issues related to exceptions." A related paper was published as IRST Technical Report 0101-05, Istituto Trentino di Cultura, January 2001 (December 2000: Centro per La Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura). See the original Postscript and the online abstract. [cache]
[October 31, 2001] "SGML and XML Document Grammars and Exceptions." By Pekka Kilpeläinen and Derick Wood. In Information and Computation Volume 169, Number 2 (September 2001), pages 230-251 (with 19 references). "The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow users to define document-type definitions (DTDs), which are essentially extended context-free grammars expressed in a notation that is similar to extended Backus-Naur form. The right-hand side of a production, called a content model, is both an extended and a restricted regular expression. The semantics of content models for SGML DTDs can be modified by exceptions (XML does not allow exceptions). Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. We give precise definitions of the semantics of exceptions, and prove that they do not increase the expressive power of SGML DTDs when we restrict DTDs according to accepted SGML practice. We prove the following results: (1) Exceptions do not increase the expressive power of extended context-free grammars. (2) For each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. (3) For each DTD with exceptions, we can construct a structurally equivalent DTD when we restrict the DTD to adhere to accepted SGML practice, (4) Exceptions are a powerful shorthand notation-eliminating them may cause exponential growth in the size of an extended context-free grammar or of a DTD." See the Technical Report for a related version.
[January 23, 1998] SGML Exceptions and XML," by Eve Maler (ArborText) - "briefly describes SGML exceptions (inclusions and exclusions) and discusses how 'exception users' can handle their DTDs and data in XML, which does not allow exceptions"; [local archive copy]
"A New Generation of Tools for SGML." By Richard W Matzen. In Markup Languages: Theory & Practice 1/1 (Winter 1999) 47-74 (with 21 references). "Exceptions are used in many standard DTDs, including HTML, because they add expressive power for DTD authors. However, there is a tradeoff: although they are useful, exceptions add significantly to the complexity of DTDs. Authoring DTDs is a difficult task, and existing tools are of limited use because of the lack of a suitable formal model for exceptions. This paper describes methods for constructing a static model that completely and precisely describes DTDs with exceptions. . ."
"A New Tool for SGML with Applications for the World Wide Web. By Matzen and Hedrick. Paper presented at SAC '98 - 1998 ACM Symposium on Applied Computing. February 27 - March 1, 1998, Marriott Marquis, Atlanta, Georgia, U. S. A. See the associated SGML exceptions analysis (results): The results shown in [this results set] are from running the prototype software tool described in the above paper. And see the authors' paper, "Unraveling Exceptions," Conference Proceedings: SGML/XML 97, Washington D.C., December, 1997.
Eliot Kimber (April 20, 1996)
Michael Sperberg-McQueen (April 15, 1996)
Collection of postings from CTS, December 1994
Erik Naggum (August 1994)
Erik Naggum (1994)
Erik Naggum (August 1994)
Marcy Thompson (August 1994)
Len Bullard (August 1994)
Other links from Arjan Loeffen's indexed CTS database (or: see the complete subject listing)

CDATA [and RCDATA] as Declared Content

[CR: 19970905]

Article: "CDATA Confusion", by Joe English (Fri Oct 6 19:13:06 PDT 1995). The author endeavors to clear up confusion that results from the fact that "The keyword CDATA has (by [his] my count) at least five different meanings in SGML." [mirror copy]
"CDATA in attributes and content." - Another explanation of CDATA as declared content, from Joe English [September 1997]
Collection of CTS postings on CDATA and RCDATA." Erik Naggum concluded: ". . .CDATA and RCDATA are dangerous and best avoided." See especially Erik's second posting in this collection.
Joe English on CDATA as declared content: "CDATA declared content: Pure Evil." "CDATA declared content is in general a bad idea . . . Of all of SGML's broken features, CDATA declared content is among the worst." A mirror copy.
RCDATA limitations/hazards illustrated in an example

Elements versus attributes - How do I decide?

[CR: 19980408]

I have prepared a short document which bears the title "SGML/XML: Elements versus Attributes. When Should I Use Elements, and When Should I Use Attributes?".

Duplicate tokens in an attribute definition list (FAQ)

[CR: 19970925]

A short document explaining why name tokens cannot be duplicated in an attribute definition list, even within different groups. Updated September 25, 1997. Thus, it explains why the following is not allowed:

<!ATTLIST candidate constantlyChangesPosition (YES | NO) YES
                    liesWithoutFlinching      (YES | NO) YES >

It also explains the rationale for the particular design (limitation) in SGML, offers some work-arounds, and speculates on whether/how the infelicity might be addressed in a revision to SGML. As of August 1997, (WG8 N1929) it appears that the restriction, in part, will be eliminated. Your contributions to this compilation are welcome.

"Are entity references recognized and replaced in attribute values?"

[CR: 19990617]

Entity References in Attribute Values, by C. M. Sperberg-McQueen; [local mirror copy]
The question asked and answered on CTS [December 1996, June 1992]
The question asked and answered again on CTS [December 1996]

SGML/XML and Context Free Grammar [SGML, CFG, BNF],

[CR: 19970523]

SGML and XML grammars as context-free or not: Joe English explains that/why SGML isn't, quite -- despite the enduring complaint from computer scientists that "it should have been. . . could have been"]
Others who have written in detail about SGML grammars: Derick Wood, Anne Brüggemann-Klein
SGML Grammar Bibliography for Derick Wood, Anne Brüggemann-Klein, Frank Tompa, Darrell Raymond, Pekka Kilpeläinen; [mirror copy]

SGML/XML and Forest/Hedge Automata Theory

[CR: 20000224]

A separate document contains a collection of references on Tree-Regular Languages, Forest-Regular Languages, DTD/Document Transformation, Forest/Tree Automata Theory, Schema Languages, and related matters. Most of the publications below are written by Murata Makoto and Paul Prescod; the corresponding bibliographic entries contain document abstracts. As of early 1999, 'forest automata' are being referred to as 'hedge automata' in the context of SGML/XML schemas.

SGML/XML and (Non-) Hierarchy

[CR: 20020805]

See the separate document "Markup Languages and (Non-) Hierarchies."

SGML SUBDOC Feature

[CR: 19981003]

Some SGML experts don't think the SUBDOC feature of SGML is worth much, at least as defined in ISO 8879:1986. See the CTS archives for discussion. Eliot Kimber, on the other hand, is a champion of SUBDOC. See the links below.

SUBDOC and Architectures. Kimber, April 1997. The post answers two commmon questions, to start with: (1) "When the parser encounters a SUBDOC entity reference, it doesn't parse the entity. Why not?"; (2) "As each subdocument can have its own DTD, won't the authors be able to get around our DTD by using subdocuments?"
"Re-Usable SGML: A Plea for SUBDOC", by Eliot Kimber (poster session first presented at SGML '95); [mirror copy, text and partial links only]
[October 03, 1998] "The Challenge of Implementing SUBDOC: With Some HyTime Support." Presented by Erlend Øverby, University of Oslo (Norway) at Markup Technologies '98. "At the University of Oslo, we have been working with SGML since 1992. Currently, we have over 120 authors maintaining approximately 1000 SGML files. We have found the SUBDOC feature a useful way to manage SGML materials that must be edited as freestanding units, but that are combined, for publication, with other materials. SUBDOC simplifies our use of marked sections for conditional text, and eliminates many ID/IDREF name conflicts. Simple HyTime linking helps us manage cross-references between subdocuments effectively."
[June 18, 1997 update] Note on several recent developments in industry and standards which make the use of SUBDOC more likely. See:
[June 24, 1997] "Cool Tool: Value Reference (Subdoc) Resolution for DSSSL and JADE," by Eliot Kimber. "I wrote a DSSSL spec, using the JADE SGML back end, that produces a new instance from a compound document. This new instance can then be formatted normally. This spec is available from the ISOGEN Web site at "http://www.isogen.com/demos/dovalueref.html" ("value reference" being the new HyTime facility that reflects the semantics of SUBDOC reference, among other things). See the context for discussion of this 'cool tool'.
Eliot Kimber on SUBDOC

Other Grammar/Parsing Issues, and FEATURES

[CR: 19981216]

Nothing here (or anywhere) should be taken as gospel: consult your local SGML expert for the latest word. [**Need to create other subsections for topics; need a dedicated FAQ**]

On CDATA and RCDATA as declared content: see above
DTDs for DTDs - Including DTD documentation, DTD semantics. Some hints, pointers, prior work.
Towards a Formal Characterisation of SGML Recognition Modes (Rick Jelliffe, Allette Systems) [HTML document that explains SGML recognition modes using state machine diagrams in an attempt to unravel and complement ISO 8879 s9.6 'Recognition Modes']
Valuable collection of three posters delivered by Erik Naggum at SGML '94 at Tyson's Corner, VA: "How to Your Make SGML Documents Survive You" [poster-1.txt], (mirror copy); "Some Problems in Parsing SGML with Standard Software" [poster-2.txt], (mirror copy); "Implementing the Long-Term Information Investment Protector" [poster-3.txt], (mirror copy)
Notes on SGML and HTML (Joe English)
Using LINK profitably (Steve Pepper)
"Why I Want the SGML LINK Feature", by W. Eliot Kimber; [mirror copy, partial links only]
SGML: The Capacity Set (some think is functionally obsolete); [mirror copy, from let.rug.nl]

Architectural Forms and SGML/XML Architectures

[CR: 20010505] [Table of Contents]

Information on architectural forms and AF processing is provided in a separate document.

Groves, Grove Plans, and Property Sets in SGML/XML/DSSSL/HyTime

[CR: 20000419] [Table of Contents]

GROVE - "Graph Representation of Property Values." Description and references are provided in a separate document.

ESIS - ISO 8879 Element Structure Information Set

[CR: 19970622]

"The set of information that is acted upon by implementations of structure-controlled applications is called the 'element structure information set' (ESIS). ESIS is implicit in ISO 8879, but is not defined there explicitly." [from WG8 N931 Attachment, as documented below]

Links:

See The SGML Handbook, by Charles F. Goldfarb, pages 588-593 ('ESIS' is not in the volume index). The description of ESIS presented there is within Appendix B, "ISO/IEC JTC1/SC18/WG8 N1035: Recommendations for a Possible Revision of ISO 8879." The ESIS description is supplied as an attachment ("Attachment 1: The ISO 8879 Element Structure Information Set (ESIS)."
ESIS description from WG8 N931, attachment
Pointers to ESIS description, from Jacques Deseyne
Other postings on ESIS [how to learn about it], by Ingo Macherius, T. Kurt Bond, and Jacco van Ossenbruggen
ESIS described in an attachment to N1035; [its mirror copy]
ESIS - Standard Generalized Markup Language (SGML) Property Set (relevant modules), from Peter Newcomb: http://www.techno.com/~peter/sgml-esis/

ISO 8879: Character Sets and Multilingual Text, including Extended Reference Concrete Syntaxes (ERCS)

[CR: 20000425]

Below is a list of links to resources that may be relevant to multilingual text and character sets used in SGML documents.

Names of Languages - ISO 639
Names of Scripts - ISO 15924
"Unicode and XML."
Work on Extended Reference Concrete Syntaxes (ERCS), and ISO 10646 (Unicode) Public Character Entities. See documents on the SGML Open server, including a an ERCS goals document. Description of this effort: "The major technical challenge for SGML at the current time is how to support the SGML documents of languages that require more than just ISO 646 (ASCII): East Asian and CJK (China/Japan/Korea) documents in particular. The proposed Extended Reference Concrete Syntaxes (ERCS) address the issues of native-language tagging and "highest-common-denominator" tagging for interchange between different character sets." [March 8, 1995] Comments/communiques to: ricko@allette.com.au.
[July 19, 1999] "Retrospective on ERCS: the Extended Reference Concrete Syntax. [or 'ASCII: My Part in its Downfall'." By Rick Jelliffe (Computing Center, Academia Sinica, Taipei, Taiwan). Date: 1999-05-20. "In mid 1994 I was given a background project at Allette Systems, Sydney, where I worked as Senior SGML Consultant, to identify any technical reasons why the SGML market in East Asia and South East Asia was stagnant. This project soon consumed quite a lot of my time, involving trips to Japan and discussion with various contacts; learning a lot more about SGML and characters and many arcane tidbits about CJK (China/Japan/Korea) publishing. Out of this research evolved some proposals called the Extended Reference Concrete Syntax (ERCS), which received much useful feedback, especially from Japanese. . . This retrospective looks at the issues and solutions raised by ERCS and what has become of them after five years. ERCS is not now promoted as a syntax with its own identity: it has found adequate expression in XML, whose development started at the very time and meetings that the minor corrections to SGML required for ERCS (Annex J, the Extended Naming Rules) were adopted." See SGML/XML Character Sets and Multilingual Text, including Extended Reference Concrete Syntaxes (ERCS)." [local archive copy]
MathML ISO entity sets (for XML); [cache]
ERCS Home Page alias "SGML Without Boundaries"
Extended Reference Concrete Syntax (ERCS) Proposal, Text Format, [mirror copy]
Internationalization of the Hypertext Markup Language, by F. Yergeau, G. Nicol, G. Adams, M. Dürst. RFC 2070. January, 1997; [mirror copy]. Also: in plain text format; [mirror copy]
WG8 TC for Extended Naming Rules for SGML, WG8 N1896Rev; [mirror copy]. Naming rules for non-Latin scripts. Affects especially production "[189] naming rules".
"What characters are safe in URLs, XML public identifiers, and plain-old-email?" By Steve DeRose.
[September 18, 1997] Superb article on Unicode, for XML/SGML developers: "Unicode and Internationalization Issues in Document Management: A Global Solution to Local Problems," by François Chahuneau, general manager of AIS/Berger-Levrault. The Gilbane Report Volume 5, Number 4 (July/August 1997) 1-25. See the bibliographic entry for other details.
[April 22, 1998] "10646 and All That. Unicode, ISO 10646, and the Quest for a Universal Character Set." By Tony Graham. Slides (in HTML) from a tutorial presentation on Unicode, given at the Washington SGML Users Group, Washington, D.C., April [15], 1998. Discusses (also) the use of Unicode with SGML, XML, DSSSL, and XSL. See other papers online from Mulberry Technologies Inc.
[October 24, 1997] Unicode 2.0 code charts
"ISO 8859-1 and 10646 Characters as SGML Entities." - From Matt Corks, mvcorks@uwaterloo.ca.
UNICODE Plane 14 Characters for Language Tags - Unicode Technical Report # 7. "The Plane 14 technical report addresses the recurrent and persistent call for a lighter-weight mechanism for text tagging than typical text markup mechanisms in Unicode. Language tags are of general interest and should have a high degree of interoperability for protocol usage. To this end, a specific LANGUAGE TAG tag identification character is provided. A Plane 14 tag string prefixed by U-000E0001 LANGUAGE TAG is specified to constitute a language tag. Furthermore, the tag values for the language tag are to be spelled out as specified in RFC 1766, making use only of registered tag values or of user-defined language tags starting with the characters 'x-'." [local archive copy]
RFC 1766 Language code assignments - By Michael Everson. "As language-tag reviewer for RFC 1766, I have made and intend to maintain the following table to help users access the codes and information on them. Clicking on the name of the code itself will open the registration document from the IANA website..."
"Language Tagging in Unicode Plain Text." K. Whistle and G. Adams (Spyglass). Request for Comments: 2482. January 1999. "This document proposed a mechanism for language tagging in UNICODE plain text. A set of special-use tag characters on Plane 14 of ISO10646 (accessible through UTF-8, UTF-16, and UCS-4 encoding forms) are proposed for encoding to enable the spelling out of ASCII-based string tags using characters which can be strictly separated from ordinary text content characters in ISO10646 (or UNICODE)." And see above. [local archive copy]
[October 28, 1998] ISO/IEC 10646-1:1993 charmap with mnemonic.ds symbolnames E.g., '<ampersand> /x00/x26 AMPERSAND'. [local archive copy]
[October 17, 1997] "Options for Presentation of Multilingual Text: Use of the Unicode Standard." By Janet C. Erickson. Prepared for John Price-Wilkin as a Digital Information Associate Research Project. March 14, 1997.
UNICODE Test (web) pages: http://www.reuters.com/unicode/iuc10/x-utf8.html, or http://www.reuters.com/unicode/iuc10/x-ncr.html
The Multilingual WWW [by Gavin Nicol, November 1994]; [mirror copy]
See the conference entry for the November 1996 "Web Internationalization & Multilinguism Symposium," and a symposium report by Martin Bryan; [mirror copy]
W3C Internationalization Page
Erik Naggum's ISO 10646 tables
WInter = Web Internationalization & Multilinguism
All the ISO 10646 (Unicode) characters as SGML SDATA entities, arranged into convenient subsets according to the script of the characters: WWW-viewable at http://www.allette.com.au/allette/sgml/ercs/allent.htm
Similarly: gzipped tar file at http://www.allette.com.au/allette/sgml/ercs/entities/spread.tar.gz, [mirror copy, December 1996].
See the TEI's implementation of a writing system declaration Writing System Declarations: TEI Server at UIC
Unicode Version 1.0 mappings to SGML entities' [mirror copy]
See the TEI P3 chapter on character set issues: FTP from UIC or any mirror site under file name p3ch.doc (P3CH.DOC)
"Web: Multilinguism & Internationalization", by M. T. Carrasco Benitez; [mirror copy]
See Harry E. Gaylord's draft paper on "Character Representation": draft version of a TEI-related paper [June 24, 1994], to be published in revised format in CHUM. Available via FTP from ftp.let.rug.nl, or as a mirror copy on the SGML/XML Web Page. For further bibliographic detail, see the related bibliographic entry.
"Character Set Considered Harmful." 1995 INTERNET-DRAFT. By Dan Connolly.
"A tutorial on character code issues." By Jukka Korpela, Helsinki University of Technology (HUT), Computing Centre, Finland.
See Unicode FTP site at unicode.org/pub, or the WWW server via 'stonehand.com', or WWW access via 'unicode.org'. [But note: UNICODE as an internationalized character set is not a full solution for multilingual "language" encoding in electronic text.]
Martin Ramsch - iso8859-1 table
DocBook DTD Character Entities, also available in Postscript format; [mirror copy].
HTML/SGML: Special Characters Test
David J. Birnbaum's page on standardization
ISO/IEC PDTR 15285: "Information Technology - An Operational Model for Characters and Glyphs." Version: 09-January-1997. Available in .DOC (Word 6.0/7.0) and Postscript. See: FTP directory for character/glyph model document; [mirror copy, ZIP Postscript version].

Conformance: SGML/XML/HTML

[CR: 19971107]

See Charles Goldfarb's SGML Handbook for discussion of conformance -- an aspect of SGML that is highly controversial. Exoterica Corporation distributes a CD-ROM with many test suites; see The Compleat SGML in the main bibliography. See also the bibliographic entry for ISO 13673 Conformance Testing for SGML Systems [project editor Lynne Price]. The CTI-SEMA: SGML Test Materials from GCARI are available from several sites:

Charles F. Goldfarb's SGML Purity Test
Society for the Definitive Abolition of Tag Abuse
SGML Open XML Conformance Subcommittee (under SGML Open's Technical Committe) will be working on an XML Conformance Test Methodology, etc. Contact G. Ken Holman for additional information.
Bibliography entry for article by Sheila Lewis, "European Conformance Testing Service for SGML" [1989]
Via FTP from the SGML Repository
Via FTP from Exeter
Also: search the SGML bibliography database for titles/abstracts with "conformance". For example: Laplante, or Gennusa.

ISO 8879: DTDS (Repositories of document type declarations/definitions conforming to ISO 8879)

[CR: 19980508] [Table of Contents]

Most of the SGML/XML case studies presented in the major sections for applications (General, Academic, Industry) reference the relevant DTDs. Some of the more common "standard" DTDs are referenced in the documents cited below. "Standard" DTDs, the reader should understand, are often modified locally in order to support slightly different conceptual models, and to accommodate particular processing needs. "Standard" DTDs that do not change to meet the developing requirements of users tend to become fossils.

DTDs listed on John Lamp's 'Net' Page (see the section "DTDs")
Industry DTDs, listed at the SGML University
Links to DTDs from Advanced Publishing Technologies, Thomson Technology Services Group
SGML Implementors: DTDs, from Copernican Solutions
Sample DTDs from the SGML Repository
TEI DTDs from the TEI FTP archive at UIC
SEMA's Write-It DTDs via Exeter
<TAG> Article DTD, dated 3-Jul-1995; [local archive copy]
DTD collection - Spyglass Technical Reference (HTML DTDs)
Bob Agnew's DTD collection (the largest collection I have seen)
DTD collection on the Darmstadt FTP server

SGML/XML Entity Sets and Entity Management

[CR: 20030408] [Table of Contents]

ISO entity sets: explained by Rick Jelliffe
XML Characater Entities. "An XML entity set corresponding to ISO 8879 SGML character entities." DocBook (Norm Walsh). Version 0.3 as of 13-June-2003.
"Community Contribution: Problems with ISO Entity Sets (ISO 8879 and 9573-13)." By David Carlisle. Reference: ISO/IEC JTC 1/SC34 N0387. "Real-world use of ISO entity sets has revealed issues that need to be addressed for the user community... This document is a personal note on problems encountered whilst working with ISO entity sets, and in particular in attempting to produce mappings of the ISO entities to Unicode to enable XML DTD declarations to be made. Much of this work has been undertaken as the Editor of the W3C MathML DTD, but this is a personal note, produced in response to an email request quoted at the end. MathML Character Descriptions contains further information including mappings to Unicode for all the ISO 8859 and 9573-13 entity sets, whether or not they are used in MathML. Docbook has a similar page describing its mappings...
[March 29, 2005] Information technology — SGML support facilities — Techniques for using SGML. Part 13: Public Entity Sets for Mathematics and Sciences. Edited by Martin Bryan David Carlisle. 2003-11-26. Project: PDTR 9573-13:2004 (type 3) 2nd Edition. Project Editor: Dr. David P. Carlisle. PDTR Ballot version, 2005-02-24. See also the reference page [SO/IEC JTC 1/SC 34 N0599] and ballot [Ballot due 2005-05-24 PDTR 9573-13: Maths and scientific character sets]. "Tens of thousands of graphic characters are used in publishing text, a large proportion of which have been defined in ISO/IEC 10646. Even where standard coded representations exist, however, there may be situations in which they cannot be keyboarded conveniently or accurately, or in which it is not possible to display the desired visual depiction of the characters. To help overcome these barriers to the successful interchange of SGML and related documents, this part of ISO/IEC TR 9573 defines character entity sets for some widely used special graphic characters regularly used in the production of scientific and mathematical documents. [Note 1: Entity repertoires are necessarily larger and more repetitious than character sets, as they deal in general with higher-level constructs. For example, unique entities have been defined for each accented Latin alphabetic character, while a character set might represent such characters as combinations of letters and diacritical mark characters.] In many instances upper- and lower-case is used to differentiate the names of entities. It is assumed that any SGML concrete syntax used in conjunction with these entity names will be case sensitive. [Note 2: The reference concrete syntax defined in ISO/IEC 8879 (SGML) is case sensitive.] This edition of the standard has been aligned with the Unicode 3.2 updates to ISO/IEC 10646:2000, as covered by Amendment 1 to the standard. For the purposes of backwards compatibility the names assigned to the characters in the original edition of the standard are shown before those assigned to the character in ISO/IEC 10646. References to characters in this part should, however, refer to the ISO/IEC 10646 name rather than the name originally assigned by ISO/IEC TR 9573..." Source: PDF.
[February 05, 2002] "XML Character Entities Version 0.2." Edited by Norman Walsh for the OASIS DocBook Technical Committee. Working Draft 04-February-2002. This Standard defines XML encodings of the 19 standard character entity sets defined in Non-normative Annex D of ISO 8879:1986 (ISO 8879:1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML). 1986. [Caveats: a working draft constructed by the editor; not yet an official committee work product, and may not reflect the consensus opinion of the committee; a few acknowledged discrepancies WRT the character mappings in this draft.] "This Standard defines XML encodings of the standard SGML character entity sets. Non-normative Annex D of [ISO 8879:1986] defines 19 standard SGML character entity sets: Added Latin 1, Added Latin 2, Greek Letters, Monotoniko Greek, Russian Cyrillic, Non-Russian Cyrillic, Numeric and Special Graphic, Diacritical Marks, Publishing, Box and Line Drawing, General Technical, Greek Symbols, Alternative Greek Symbols, Added Math Symbols: Ordinary, Added Math Symbols: Binary Operators, Added Math Symbols: Relations, Added Math Symbols: Negated Relations, Added Math Symbols: Arrow Relations, Added Math Symbols: Delimiters. The SGML declarations for these entities use the specific character data (SDATA) entity type that is not supported in XML, so alternative XML declarations are necessary. In XML, the specific character data of most entities can be expressed as a Unicode character." In addition to the character entity sets, the document defines 'XML Character Elements'. Design rationale: "Named XML entities (except for the five predefined entities) cannot be used if they are not declared. Entity declaration requires either an external or an internal subset. Some classes of applications forbid the occurrence of markup declarations in documents. For these documents, named character entities are inaccessible...we [therefore] introduce an XML vocabulary with the semantics of character entity reference. This Standard defines the semantics of elements and attributes declared in the http://www.oasis-open.org/docbook/xmlcharent/names namespace. This namespace contains exactly one element, char. The char element has two attributes, entity and name. They are mutually exclusive. The entity attribute identifies characters by their character entity names. (The set of valid names is the closed set of names associated with character entity sets defined by this Standard.) Case is significant in entity names. The name attribute identifies characters by their Unicode character names..." See the version of March 19, 2002 [and later].
Nineteen ISO 8879:1986 character entity sets: Added Math Symbols: Arrow Relations, Added Math Symbols: Binary Operators, Added Math Symbols: Delimiters, Added Math Symbols: Negated Relations, Added Math Symbols: Ordinary, Added Math Symbols: Relations, Box and Line Drawing, Russian Cyrillic, Non-Russian Cyrillic, Diacritical Marks, Greek Letters, Monotoniko Greek, Greek Symbols, Alternative Greek Symbols, Added Latin 1, Added Latin 2, Numeric and Special Graphic, Publishing, General Technical. Available as separate disk files in a ZIP archive, or as a single concatenated text file. Source: the SGML Repository.
ISO 8879 entity sets in 20 disk files, from the WG8 FTP server, here available concatenated, in text format or in UNIX tar/gzip format. Disk files dated to June 28, 1995.
[August 17, 2001] On the production of a set of XML character entities by the OASIS DocBook Technical Committee: Norm Walsh proposed a charter modification to read "... The DocBook TC will develop, and publish as a separate specfication, a set of XML character entities, based on previously published ISO 9573 entity sets (themselves an extension of the ISO 8879 entity sets) for use in the XML version of DocBook and other related XML specifications..." Or possibly: (proposed by Eduardo Gutentag) "The DocBook TC will also develop specifications based on previously published standards for use in the XML version of DocBook and other related XML specifications; the first such specification will be based on previously published ISO 9573 entity sets (themselves an extension of the ISO 8879 entity sets)."
[September 20, 1997] "ISO Character Entity Sets", collected and organized by Murray Altheim (Sun Microsystems). Part of an investigation to see "how the i18n draft, Unicode, and the current ISO entity sets may work together in SGML, HTML and XML." Also part of "SunSoft - SHML 1.0 Development Version, code-named "Mehitabel" Document Type Definition for the HyperText Markup Language." In the listings, ".ent files are CDATA numeric character references; .gml are SDATA 'square bracketed' strings; .pen are XML-compatible Unicode numeric character references."
TEI Standard Writing System Declarations & Entities. IPA, Arabic, Coptic, Classical Greek, ISO 646 (non-national Subset), ISO 646 (International Reference Version), ISO 8859-1 (-2, -5, -7, -8, -9), ISO 8879:1986 (partial). These entity sets and writing system declaration files are available here a in ZIP archive [December 1996]
[March 29, 1999] XML Entity Declarations - Entity Declarations for ISO, HTML and MathML character sets. From David Carlisle, OpenMath project.
[September 07, 1998] Rune Mathisen and Vidar Gundersen reported on CTS that they are preparing documentation covering the ISO character entities and their LaTeX equivalents. The effort represents "an attempt to make an overview of the ISO character entities (ISO 8879:1986) and their LaTeX equivalents, and to provide a handy reference to the ISO character entities for non-LaTeX users." The online materials will provide the mappings necessary to translate SGML documents to TeX/LaTeX, and will provide documentation on how the characters look by producing glyphs.
[October 09, 1998] On October 8, 1998, Sebastian Rahtz posted an announcement concerning the preparation a new 'catalogue of Unicode positions and their corresponding entity names in various sets, including MathML, and the LaTeX equivalent. Each Unicode character may be mapped to several entities'. The document is presented in XML format. Rahtz credits several other people for advice and earlier work: David Carlisle, Robert Streich, Nico Poppelier, Rune Mathisen, and Vidar Gundersen. [local archive copy, 1998-10-08]
[July 26, 1997] Submission of "XML-ixed" ISO 8879 entity sets, by Rick Jelliffe of Allette Systems; the postings were made to the XML Development list. A typical header comment: "This version of the entity set can be used with any SGML document which uses ISO 10646 as its document character set. This includes XML documents and ISO HTML documents. This entity set uses hexadecimal numeric character references." Please report any errors to Rick: ricko@allette.com.au. The entities mapped to hex are in the following files: ISOgrk4.pen, ISOtech.pen, ISOdia.pen, ISOlat1.pen, ISOgrk1.pen, ISOlat2.pen, ISOgrk2.pen, ISOnum.pen, ISOgrk3.pen, ISOpub.pen. Available in a concatenated file, or archived as separate files in a .ZIP package. Note the disclaimer.
Public Text for Entity Sets published in ISO/IEC TR 9573-13:1991 (ZIP archive file with 12 entity sets); or concatenated entity set files, text version, supplied by Anders Berglund (see the main ISO 9573 entry)
[June, 1997] SGML Public Entity Sets, Proposals. Sample collections of entities and glyphs (proposed) for potential inclusion into ISO 9573. For: Ugaritic, Old Persian, Glagolitic, Croatian, Buginese, Cherokee, and Gothic Uncials. Developed by Anders Berglund and others. [mirror copy, descriptive text only]
Note on 9573 entity sets for chemistry, by Martin Bryan (September 1995)
Ancient Greek Character Entities - Derived from TEI Writing System Declaration for Ancient Greek. By James Tauber; [local archive copy]
Math Entities - From AMS
List of Symbols represented by entities - Springer-Verlag Preview Journals Service; [mirror copy]
[November 1997] NCBI SGML entity list; [local archive copy]
DocBook DTD Character Entities
Entity sets via FTP to Darmstadt
Additional named entities for HTML - W3C Working Draft 25-Nov-96 = WD-entities-961125; [mirror copy]
See also: the SPREAD entity sets - ISO 10646 (Unicode) characters as SGML SDATA entities

SGML Entity Types, and Entity Management

[CR: 19970820] [Table of Contents]

Entity Management in SGML, by Charles F. Goldfarb, November 30, 1993 [mirror copy]
Entity Management. SGML Open Technical Resolution 9401:1995. (Amendment 1 to TR 9401) [mirror copy]
Abstract of Entity Management Resolution [entity catalog] (SGML Open Technical Resolution 9401:1994), August 9, 1995, by Paul Grosso. A mirror copy is also available here.
"SGML Document Management" (W. Eliot Kimber). This excellent paper talks about the role of SGML's virtual storage model; [mirror copy]
"A Taxonomy of SGML Entities", by Joe English [mirror copy]; see also "Different ways of looking at an SGML document" [mirror copy]
[Coming soon]: HyD: An Open SGML Database Specification for Entity Management. From Copernican Solutions, HyD is "a project . . .to create an open specification and reference implementation for SGML database systems. We have designed a data model based on the HyTime Corrigendum's entity management and storage manager model. Our model uses SGML with HyTime constructs for the interchange of information and location of objects within the repository."

Catalogs, Formal Public Identifiers, Formal System Identifiers

[CR: 20030618] [Table of Contents]

SGML Open Catalogs - Using SGML Open catalogs to generate system identifiers. - Documentation in James Clark's SP distribution. [local archive copy]
[June 18, 2003] OASIS Entity Resolution TC Approves XML Catalogs Specification for Public Review. Members of the OASIS Entity Resolution Technical Committee have voted to approve the latest revision of the XML Catalogs specification as a Committee Specification and to submit the document for public review. The XML Catalogs specification describes an interoperable method for mapping the information in an XML external identifier into a URI reference for the XML external resource. An entity catalog is defined for this purpose, designed to handle "two simple cases: (1) mapping an external entity's public identifier and/or system identifier to a URI reference; (2) mapping the URI reference of a resource (a namespace name, stylesheet, image, etc.) to another URI reference." Three non-normative appendices provide formal definitions for the XML Catalog, including W3C XML Schema, RELAX NG Grammar, and XML DTD. The OASIS TC was chartered in October 2000 to provide an XML syntax for a simple entity catalog format, as envisioned in an earlier OASIS Technical Resolution. A 30-day public review of the XML Catalogs specification will take place from June 18, 2003 through July 18, 2003 in preparation for consideration of the specification as an OASIS Open standard.
[August 14, 2001] Sun Microsystems Releases Java Classes for XML Entity and URI Resolution. A posting from Norman Walsh (Sun Microsystems) announces the release of a set of Java classes originally written to implement the OASIS XML Catalogs Committee Specification for SAX entityResolver() and JAXP URIResolver(). These classes "greatly simplify the task of using Catalog files to perform entity resolution. You can use these classes directly 'out of the box' with their applications (such as Xalan and Saxon) or customize them to suit your particular needs. Developers will also be interested in the included JavaDoc API Documentation. The distribution package includes Java classes, JavaDoc API documentation, and step-by-step instructions explaining how to use and customize the resolver components." The Preview Version 0.2 requires JDK 1.2 or later. The package with binaries and sample code is available for download from the Sun XML Developer Connection. [Full context]
[July 18, 2001] "XML Catalogs." For the OASIS Entity Resolution TC. Working Draft 16-July-2001. Edited by Norman Walsh (Sun Microsystems). "The requirement that all external identifiers in XML documents must provide a system identifier has unquestionably been of tremendous short-term benefit to the XML community. It has allowed a whole generation of tools to be developed without the added complexity of explicit entity management. However, the interoperability of XML documents has been impeded in several ways by the lack of entity management facilities: (1) External identifiers may require resources that are not always available. For example, a system identifier that points to a resource on another machine may be inaccessible if a network connection is not available. (2) External identifiers may require protocols that are not accessible to all of the vendors' tools on a single computer system. An external identifier that is addressed with the ftp: protocol, for example, is not accessible to a tool that does not support that protocol. (3) It is often convenient to access resources using system identifiers that point to local resources. Exchanging documents that refer to local resources with other systems is problematic at best and impossible at worst. The problems involved with sharing documents, or packages of documents, across multiple systems are large and complex. While there are many important issues involved and a complete solution is beyond the current scope, the OASIS membership agrees upon the enclosed set of conventions to address a useful subset of the complete problem. To address these issues, this Standard defines an entity catalog that maps both external identifiers and arbitrary URI references to URI references..." [cache]
[October 12, 2000] OASIS Technical Committee on Entity Resolution. An announcement released by Karl Best (OASIS - Director, Technical Operations) describes a proposed 'Entity Resolution' technical committee, to be formed under the rules of the Technical Committee Process as announced in early October. The new committee would continue work begun under the SGML Open Technical Resolution on Entity Management (entity catalog formats, formal system identifiers, etc.), updating this work to cover XML. "A new OASIS technical committee is being formed. The Entity Resolution TC has been proposed by Lauren Wood, SoftQuad Software Inc.; Norman Walsh, Sun Microsystems; Paul Grosso, Arbortext, Inc.; and John Cowan, Reuters Health. The request for a new TC meets the requirements of the OASIS TC process. . . The objective of the Entity Resolution TC is to provide facilities to address issue A of the OASIS catalog specification (TR 9401). These facilities will take into account new XML features and delete those features of TR 9401 that are only applicable to SGML, as well as those features applicable only to issue B in TR 9401. Deliverables: The Entity Resolution TC will produce a Committee Specification that uses XML syntax and provides a DTD (potentially also an XML Schema) for that syntax. This specification will be ready by August 2001. The Entity Resolution TC intends to submit the Committee Specification as an OASIS Standard after sufficient implementation experience has been gathered." See the TC mailing list archives.
Don Stinchfield, Using Catalogs and MIME to Exchange SGML Documents. MIMESGML Working Group, INTERNET-DRAFT. Providence, RI: EBT and MIMESGML Working Group, IETF, 1995. "This draft proposes a standard for exchanging SGML documents over the World Wide Web using catalogs and MIME. This draft extends SGML Open's definition of catalogs. . ." See the bibliographic entry for the full abstract and online availability.
System identifiers - Documentation in James Clark's SP
A.6 Formal System Identifier Definition Requirements (FSIDR) - from Annex A of the HyTime TC
Entity Management. SGML Open Technical Resolution 9401:1997 (Amendment 2 to TR 9401). The SGML Open Catalog and interchange package. Edited by Paul Grosso, Chair, Entity Management Subcommittee. September 10, 1997. [local archive copy, HTML] Or: Postscript, ZIP format, [local archive copy]. Packaged format: HTML, PS, DTD, etc.
Earlier version of the SGML Open CATALOG definition: Entity Management. SGML Open Technical Resolution 9401:1995. (Amendment 1 to TR 9401) [mirror copy]
[January 02, 2000] XML Namespace Resources. The end of calendar year 2000 saw eruption of (yet) another communal lament about the W3C XML Namespace specification, which fails to meet the expectations of some users. Resulting from the discussion: a number of new proposals for indicating "what a namespace URI should point to." (1) Tim Bray, one of the XML Namespace editors, licensed underground activity for a namespace markup vocabulary that could reference related resources ("it would have to be done low, fast, and under the radar...") and then floated his own suggestion for XNRL (XML Namespace Related-Resource Language). "XML Namespace Related-resource Language (XNRL) is an HTML-based markup language designed to contain a human-readable description of an XML namespace as well as pointers to multiple resources related to that namespace. Examples of such related resources include schemas, stylesheets, human-readable documentation (beyond that contained in the XNRL package) and executable code. XNRL is designed to be suitable for service as the body of a resource returned by deferencing a URI serving as an XML Namespace name. [The draft proposal] defines the syntax and semantics of XNRL, and also serves as an XNRL package for the namespace http://xnrl.org/." (2) Jonathan Borden presented an "XML Namespace Catalog Format." The proposal "defines a format for an XML Namespace Catalog. An XML Namespace Catalog serves as a text description of an XML Namespace and includes links to resources associated with the namespace such as schemata, stylesheets and/or other resources associated with the namespace URI. An XML Catalog may also map Formal Public Identifiers into System Identifiers defined as URI references. An XML Namespace Catalog is designed to be suitable for service as the body of a resource returned by deferencing a URI serving as an XML Namespace name. The XML Namespace Catalog format is an extension of XHTML with a new element named resource. The resource element serves as an XLink to the referenced resource. The resource element represents an XLink with two additional attributes public and content-type which provide for optional formal public identifiers and/or content type specifiers The proposal document defines the syntax and semantics of the XML Namespace Catalog Format, and also serves as an XML Catalog for the namespace http://www.openhealth.org/XMLCatalog. The XML Namespace Catalog 1.0 DTD has been produced as an extension of XHTML Basic 1.0." See also the example. (3) Sean B. Palmer presented XNCL (XML Namespace Catalogue Language) as just another hack attempt at producing an XML Namespace Catalogue Language that the people on XML-DEV will find solice in. XNCL is a language intended to be used as a de facto dereferencable resource for namespaces. XNCL uses empty div elements in the Link elements to avoid overloading them, and to allow for family derivations. The specification is an XHTML Family derived from XHTML Basic. It has been modified in the following ways: The content model for the link element has been changed so that it may now include div elements An additional resource element may now be used inside link elements, and they are of content type EMPTY..." Note the posting from Paul Grosso on "resource discovery directory" which (1) references the work of the OASIS Entity Resolution TC, and (2) advocates a separation of concerns, viz., between ER (entity resolution) catalogs and resource discovery (RD) directories. See: "Resource Directory Description Language (RDDL)."
[May 04, 2001] NB. See draft 04, May 08, 2001. Proposed URN Namespace for Public Identifiers. A posting from Norman Walsh contains the text of an IETF Network Working Group Internet-Draft which the authors believe "resolves all outstanding issues with respect to the request for a 'publicid' NID." The draft A URN Namespace for Public Identifiers ('draft-urn-publicid-03, May 4, 2001) is authored by Norman Walsh (Sun Microsystems, Inc.), John Cowan (Reuters Health Information), and Paul Grosso (Arbortext, Inc.). The draft "describes a URN namespace that is designed to allow Public Identifiers to be expressed in URI syntax." From the document Introduction: "XML external entities have two identifiers: a system identifier and a public identifier. The system identifier is a URI, by definition, but the public identifier is simply a string. Historically, the system identifier of an external entity has been a local, or system-specific identifier while the public identifier has been a more global, persistent name. Unfortunately, public identifiers do not fit neatly into the existing web architecture because they are not legal URIs. Many new specifications (XSLT, XML Schema, etc.) have the implicit or explicit requirement that all external identifiers be URIs. The purpose of this namespace is to allow public identifiers to be encoded in URNs in a reliable, comparable way. This document describes a scheme for representing public identifiers as URNs by introducing a public identifier namespace, 'publicid'. This namespace specification is for a formal namespace." [Full context]
[October 12, 2000] Proposed OASIS Technical Committee on Entity Resolution. An announcement released by Karl Best (OASIS - Director, Technical Operations) describes a proposed 'Entity Resolution' technical committee, to be formed under the rules of the Technical Committee Process as announced in early October. The new committee would continue work begun under the SGML Open Technical Resolution on Entity Management (entity catalog formats, formal system identifiers, etc.), updating this work to cover XML. "A new OASIS technical committee is being formed. The Entity Resolution TC has been proposed by Lauren Wood, SoftQuad Software Inc.; Norman Walsh, Sun Microsystems; Paul Grosso, Arbortext, Inc.; and John Cowan, Reuters Health. The request for a new TC meets the requirements of the OASIS TC process. . . The objective of the Entity Resolution TC is to provide facilities to address issue A of the OASIS catalog specification (TR 9401). These facilities will take into account new XML features and delete those features of TR 9401 that are only applicable to SGML, as well as those features applicable only to issue B in TR 9401. Deliverables: The Entity Resolution TC will produce a Committee Specification that uses XML syntax and provides a DTD (potentially also an XML Schema) for that syntax. This specification will be ready by August 2001. The Entity Resolution TC intends to submit the Committee Specification as an OASIS Standard after sufficient implementation experience has been gathered." See also the list of active OASIS TCs. On entity resolution, see also the topic "SGML/XML Entity Types, and Entity Management."
[April 04, 2000] Arbortext Releases Java Catalog Classes for Resolution of Public Identifiers. Arbortext, Inc., "a leading provider of Extensible Markup Language (XML)-based e-Content software, announced today the immediate availability of open source Java-based code to support public identifier resolution in XML documents. This code will enable XML processors to resolve public identifiers which increases the flexibility and interoperability of XML documents. These Java classes implement the OASIS Entity Management Catalog format [('OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401)')] as well as an XML Catalog format for resolving XML public identifiers into accessible files or resources on a user's system or throughout the Web. These classes can easily be incorporated into most Java-based XML processors, thereby giving the users of these processors all the benefits of public identifier use. As XML processors incorporate this code, users will be able to utilize public identifiers in XML documents with the confidence that they will be able to move those documents from one system to another and around the Web knowing that they will also be able to refer to the appropriate external file or Web page. These classes can easily be plugged into any SAX Parser." Available to everyone at no cost, these Java classes can be immediately downloaded from the ArborText web site. The distribution includes examples of Catalog support with both Xerces and XT. For additional details, see the full text of the ArborText announcement: "Arbortext Makes Java Catalog Classes Available For Use by XML Processors. Open Source Code Enables XML Processors to Resolve Public Identifiers."
[April 2000] Norm Walsh on system identifiers: Norm Walsh has provided a tutorial background to the problem of XML system and public identifiers in "If You Can Name It, You Can Claim It!" [Column 'Standard Deviations from Norm', Issue 3, 04-April-2000]. "The fact that XML requires me to supply system identifiers for external references, and the fact that these identifiers are required to be Uniform Resource Identifiers (URIs) is a frequent source of considerable irritation. In this column, we'll explore how you can use OASIS Catalog files (or their XML equivalent) to avoid these difficulties. Using Catalog files became a lot easier earlier this month when Arbortext released its Java Catalog classes to the XML community. Using these classes, it's simple to add Catalog support to your favorite Java parser. (Equivalent support for parsers in other languages should be fairly easy to construct from the free and Open Source of the Java classes, although Arbortext has no immediate plans to undertake this effort.) You can download the classes or view the JavaDoc API Documentation online." Previous URL now broken: http://www.arbortext.com/Think_Tank/Norm_s_Column/issue_three/issue_three.html. [cache, new URL]
[November 24, 2000] "The IANA XML Registry." By Michael Mealling (Network Solutions, Inc.). IETF Internet-Draft. Network Working Group. Reference: 'draft-mealling-iana-xmlns-registry-00.txt'. "This document describes an IANA maintained registry for IETF standards which use XML related items such as Namespaces, DTD, Schemas, and RDF Schemas. Over the past few years XML has become a widely used method for data markup. There have already been several IETF Working Groups that have produced standards that define XML DTDs, XML Namespaces and XML Schemas. Each one of these technologies uses URIs to identify their respective versions. For example, a given XML document defines its DTD using the DOCTYPE element. This element, like SGML, has a PUBLIC and a SYSTEM identifier. It is standard practice within W3C standards to forego the use of the PUBLIC identifier in favor of 'well known' SYSTEM identifiers. There have been several IETF standards that have simply created non-existent URIs in order to simply identify but not resolve the SYSTEM identifier for some given XML document. This document seeks to standardize this practice by creating an IANA maintained registry of XML elements so that document authors and implementors have a well maintained and authoritative location for their XML elements. As part of this standard, the IANA will both house the public representation of the document and assign it a Uniform Resource Name that can be used as the URI in any URI based identification components of XML."
[December 01, 2000] "XML-Deviant: What's in a Name?" By Leigh Dodds. From XML.com (November 29, 2000). ['The XML-Deviant looks at best practices for identifying XML resources; then wonders why more developers aren't taking advantage of entity management systems.'] "Correctly naming resources and objects is widely regarded as one of the most difficult problems in computing (another being caching). As the saying goes, any problem in computing can be solved by adding another level of indirection. One step toward solving naming problems is to add indirection by separating the name of the resource from its address. This is a common pattern, which we see in a number of areas from pointers in C to Persistent URLs (PURLs) on the web. XML 1.0 offers a separation between the naming and addressing of resources or entities referred to in XML documents. Broadly speaking SYSTEM identifiers define an actual resource that is retrieved, or dereferenced to retrieve, the entity in question. A PUBLIC identifier simply gives a name for the required resource. It says nothing about where that resource may be dereferenced. Of course life isn't really that simple, and its likely that some readers are already objecting. The short but heated XML-URI debate earlier this year testifies to the disagreement on this issue. A SYSTEM identifier is specified as a URI, which can be easily be a Uniform Resource Name (URN) as well, instead of being the more commonly found URL. A URN is more like a PUBLIC identifier, as it simply names the resource in question. Yet there is still no widely deployed means of using URNs..." Note also the archives of the OASIS Entity Resolution TC Mailing List.
[July 18, 1998] John Cowan (cowan@locke.ccil.org) posted a proposal to the XML-DEV mailing list for XCatalogs - "a system based on SGML/Open catalogs (Socats) for translating public identifiers to system identifiers in XML." According to the proposal, XCatalogs are meant to be "Web resources (anything from local files on up) which contain mappings from public identifiers to system identifiers, plus references to other XCatalogs. They [would] come in two syntaxes: one which is a subset of Socat syntax, and one which is an XML document instance." Note that the xmlproc - Python XML parser tool from Lars Marius Garshol supports the XCatalog. Note (1999-04-06): 'XCatalog' is now called 'XML Catalog'.
[September 25, 1998] 'Documentation: xmlproc catalog file support.' General description of the CATALOG file format, in the context of documention for xmlproc. Updated September 11, 1998 or later.
[April 05, 2000] From Eric Bohlman (XML-DEV, Wed, 5 Apr 2000): "With all the announcements about catalog processing, I might as well mention that the Perl module XML::Catalog has been available on CPAN for some time now, and I could use some more feedback than I've already gotten (I'm aware there's a problem with embedded backslashes in IDs). It supports John Cowan's catalog syntax (either form) and provides methods for translating public IDs to system IDs and for remapping system IDs." For example: http://theoryx5.uwinnipeg.ca/CPAN/data/XML-Catalog/Catalog.html.
[December 07, 1997] See the publication of the text of PDTR 9573-9 Information Processing -- Text and office systems -- Using SGML Public Identifiers for Specifying Data Notations (ISO/IEC JTC1/WG4 N1958, December 5, 1997), by Martin Bryan and Ken Holman. The technical report "provides a starter set of both notation names and public identifiers which can be used to indicate the coding used for data that conforms to internationally agreed standards published by bodies such as ISO, IEC, ITU and SMPTE. While the notations names are purely advisory, the public identifiers are defined according to the rules for naming ISO standards defined in ISO 8879 and ISO 9070. These forms should be common to all applications that use formal public identifiers." For background information on the decision, see "Recommendations of the Alexandria Meeting" of WG4 (5 December 1997). [local archive copy]
Resolving Formal Public Identifiers [local archive copy, display form only]
Formal Public Identifier RoadTrip - Spyglass Technical Reference (Murray Altheim). [local archive copy]
FPI grammar from ISO/IEC 9070:1991 and ISO 8879:1986. Compiled by Martin Bryan. [Originally from: http://www.entmp.org/fpi-urn/fpi.html]
Notes on sgmls handling of search for entities (C. M. Sperberg-McQueen)
Registration Process for Public Text Owner Identifiers
Lynne A Price, "Registering Owners for Public Text Identifiers." In <TAG> 9/5 (May 1996), 7-9. In this article, Lynne explains the registration procedure set up by the GCA. Initial registration fee is $95, changes are $25.
Formal Public Identifiers, by David Peterson (March 1994). - "Dave Peterson discusses all the parts of "formal" public identifiers, including the optional parts. Public identifiers are one of two kinds of external identifiers used by SGML systems." [local archive copy]
Formal System Indentifiers (FSIs) - W3C document. [local archive copy]
J. Tauber's 'delegate' proposal. Implemented in James Clark's SP as an extension to the SGML Open CATALOG TR. [local archive copy]
Using SGML Public Identifiers for Specifying Data Notations - Revised text of 9573-9. ISO/IEC JTC1/WG4 N1990. Covering standardized public identifiers for data encodings defined in international standards, including those published by ITU, SMPTE and NBS. "This Technical Report facilitates the interchange of data represented in internationally standardized notations ('data formats') by providing a set of ISO/IEC 9070 public text object identifiers and ISO 8879 notation declarations that different applications can reference to identify the use of such data formats." For example: 'SGML: ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)'. For additional information on 9573-9, see ISO/IEC TR 9573:1988. Techniques for Using Standard Generalized Markup Language (SGML). [local archive copy]
[August 4, 1998] Eliot Kimber on FPIs and URNs
[September 01, 1998] Eliot Kimber (ISOGEN International Corp) has authored a paper The SGML Storage Model which may be of assistance to readers who wish to understand more about the SGML entity as an abstract storage object, and how storage managers interface with an entity manager, a parser, and a processing application. The paper was written in the context of ongoing work toward SGML and STEP harmonization, but its discussion of formal system identifiers and storage models should be of wider interest. ". . .all SGML systems consist, in one way or another, of the same layers. At the bottom are the physical storage managers, the systems that actually manage data on storage media (file systems, database, etc.). Above the storage managers is the entity manager layer. Above the entity manager is the SGML parser and any processing applications. Processing applications talk to the SGML parser to get parsed SGML documents and to the entity manager directly to get data entities. . . [Conclusion:] SGML abstract storage model and entity declaration syntax coupled with the Formal System Identifier facility of ISO/IEC 10744:1997 provides a robust mechanism for representing systems of repositories and storage objects, regardless of their data types." [local archive copy]
For more on [public text] "owner identifiers," see ISO/IEC 9070:1991 Registration Procedures for Public Text Owner Identifiers.

ISO 8879: Revisions/Amendments

[CR: 19990118] [Table of Contents]

SGML (ISO 8879:1986) revisions in light of the Standard's first 5-year review are currently in progress. Technical papers and other descriptions relating to the revision are available from several sources:

See the main entry for XML (Extensible Markup Language), in which certain design features are intended to fix or improve upon ISO 8879:1986.
[January 18, 1999]
"Final Text of Revised TC2 to ISO 8879:1986." The merged text of TC2 and TC3. 6-December-1998. ISO/IEC JTC1/SC34 N0029. [local archive copy]
[December 07, 1997] WG4 has accepted N1954 "as the Disposition of Comments for the SGML TC ballot and N1955 as the final text of the Technical Corrigendum to ISO 8879," sending these documents to the JTC1 Secretariat for publication. The TC contains a normative Annex K ("Web SGML Adaptations") and an informative Annex L ("Additional Requirements for XML"). For background information on the decision, see "Recommendations of the Alexandria Meeting" of WG4 (5 December 1997). See the full text, Annex K (normative) - Web SGML Adaptations - [local archive copy]
Report of the SGML Rapporteur Group [Barcelona, Spain, May 5 - 9, 1997]. Also noted: the WebSGML Technical Corrigendum (WG8 N1929)
"The WebSGML Adaptations." Summary in slides, from Martin Bryan.
Web SGML and HTML 4.0 Explained. Online book, by Martin Bryan. A valuable reference work. See in particular the section "Web SGML Adaptations". When Extended Naming Rules (Annex J of ISO 8879) are turned on, the SGML declaration would begin <!SGML "ISO 8879:1986 (ENR)" . . .; when Web SGML Adaptations additions (Annex K) and Extended Naming Rules are both turned on, the SGML declaration would begin <!SGML "ISO 8879:1986 (WWW)" . . .. This book explains the changes of 8879 Annexes J and K in some detail.
[June 03, 1997]. Several updated documents relevant to the SGML revision, on the WG8 [ISO/IEC JTC1/SC18/WG8] Web server, maintained by WG8 Convenor James D. Mason, and on the 'Project Editor's Review of ISO 8879: Current Information Set' site, maintained by Charles F. Goldfarb. The documents provide information about 8879 revision in light of concerns for the use of SGML on the World Wide Web, including a 'WebSGML Technical Corrigendum'. The complete listing of documents from Barcelona (May 1997) is given in the dedicated WG8 register.

Among the documents recently posted are the following:
1. N1929: "Proposed text for ISO 8879 Annexes K and L: WebSGML Adaptations" - "This Technical Corrigendum adds a normative annex K and an informative annex L to ISO 8879 to meet an urgent need for adaptations of SGML for use on the World Wide Web and intranets. It incorporates by reference the Extended Naming Rules TC." [WG8 server link] - [mirror copy]
2. N1928: "User requirements for WebSGML Adaptations [WebSGML TC]" - "WebSGML TC: Requirements For a TC to Allow Simplified Forms of SGML That are Optimized for Use on the World Wide Web and Intranets" . . .This Technical Corrigendum allows simplified subsets of SGML syntax to be used, that can be interpreted by smaller and faster SGML parsers. In addition, provision is made for parsing documents without access to DTDs, or with no explicit DTD at all, and for reference to SGML declarations by reference, in order to minimize storage and transmission requirements and simplify the creation of documents." [WG8 server link] - [mirror copy] Note [October 13, 1997]: a test release of SP with improved XML support, including a number of key features from the WebSGML SGML TC.
3. N1925: "Report of the SGML group, Barcelona, May, 1997" - "We developed the WebSGML Technical Corrigendum (WG8 N1929), which received unanimous approval of the members present. . . We began development of the specification for multiple name spaces for element types, considering the proposals for element type modularity and typed subdocuments." [WG8 server link] - [mirror copy]
4. N1927: "Recommendations of the Barcelona Meeting" - references action on the HyTime Technical Corrigendum, the SGML [N1928, N1929] WebSGML TC, ISO-HTML, and other documents. [mirror copy]
5. N1924: "Proposal from the French and Norwegian National Bodies regarding the WebSGML TC", by Michel Biezunski and Steve Pepper. Groups requirements of the WebSGML TC into four "levels of desirability and attainability." [mirror copy]
ISO 8879 Revision, Fourth Interim Report, WG8 N1893, December 23, 1996: "...systematic clause-by-clause review of ISO 8879..."; [mirror copy]
Announcement for the the publication of the Third Interim Report of the Project Editor's Review of ISO 8879 (WG8 N1855). Available: from the WG8 server, http://www.ornl.gov/sgml/wg8/document/1855.htm; [mirror copy]
See a linked listing of documents which contain proposed subsets or simplifications of SGML, as collected by Michael Sperberg-McQueen: Features and Rules of ISO 8879. A Summary for Use In Discussions of the W3C SGML Working Group And Editorial Review Board. Document W3C-SGML-ERB DD-1996-0002, by C. M. Sperberg-McQueen. [12 September 1996 ]; [unofficial mirror copy, November 5, 1996]
See ISO8879-rev on the Oslo server. It's a directory of working documents on the review of ISO 8879 -- SGML.
"SGML Extended Facilities", described by Steven R. Newcomb (July 1996)
[December 27, 1997] Online version of "What You Need to Know About the New HyTime,", by Steven R. Newcomb, of TechnoTeacher Inc.. Read the article to discover the importance of the SGML Extended Facilities (Annex A), and why "the New HyTime" is therefore as much about SGML as about HyTime proper.
ISO 8879 Review Index Page (Charles Goldfarb) ([mirror copy at Allette, www.allette.com.au], April 1996)
Second Interim Report (N1701)
ISO 8879 work papers: Remote file infosrv1.ctd.ornl.gov/pub/sgml/WG8/DOCS
"A Proposal to Introduce 'Module' Structures into SGML" [namespaces], by Toru Takahashi. 12 November 1996. [mirror copy], also available as http://www.ornl.gov/sgml/wg8/document/1873.doc.
Minimal Generalized Markup Language (MGML) - by Tim Bray. "The MGML idea is at least two years old - it arose out of a conversation with Steve DeRose and Erik Naggum at SGML'94." See: (1) the brief description of draft materials; (2) MGML - an SGML Application for Describing Document Markup Languages; [mirror copy, November 5, 1996]; (3) Comparison between MGML and SGML; [mirror copy, November 5, 1996]; (4) MGML Reference DSD; [mirror copy, November 5, 1996]; (5) SGML DTD for the MGML Reference DSD; [mirror copy, November 5, 1996] Also: the package.
SGML Online - Proposal for Minimal SGML Feature Set, by Eliot Kimber [version: September 1996]; [unofficial mirror copy, November 5, 1996]
Proposed TC for Extended Naming Rules and Development Principles for SGML(WG8, N1861), by Rick Jelliffe; calls for two normative annexes to ISO 8879; [mirror copy]

Technical Resolutions and Proposals

[CR: 19981103]

Papers and Reports:

A Proposal for Delegating SGML Open Catalogs, by James K. Tauber, Sun Microsystems Laboratories [mirror copy]
Entity Management. SGML Open Technical Resolution 9401:1995. (Amendment 1 to TR 9401), by Paul Grosso, Chief Technical Officer, SGML Open. [mirror copy, FTP Postscript version, ZIP]
TABLE INTEROPERABILITY: Issues for the CALS Table Model. SGML Open Technical Research Paper 9501:1995, by Eric Severson and Harvey Bingham, Interleaf. [mirror copy, FTP Postscript version, ZIP]
CALS table model Document Type Definition. SGML Open Technical Memorandum TM 9502:1995, by Harvey Bingham, Interleaf. 1995 October 19. [mirror copy, FTP Postscript version, ZIP]
Using SGML Open Catalogs and MIME to Exchange SGML Documents, by D. Stinchfield. [MIMESGML Working Group, INTERNET-DRAFT, February 22, 1996. [mirror copy]
Exchange table model Document Type Definition. SGML Open Technical Resolution TR 9503:1995, by Harvey Bingham (Chair, Table Interchange Subcommittee). 1996 May 8. [mirror copy]. Also: the package. See the bibliographic entry for the document abstract.

SGML (and HTML) Stylesheets

[CR: 20000826]

Significant collaborative research and public discussion have been undertaken during late 1994, 1995 and 1996 on "stylesheets" -- in terms of HTML, DSSSL Online (earlier, DSSSL-Lite), and SGML generally. Electronic Book Technologies' products (e.g., DynaText) and SoftQuad's (Explorer, Panorama) already support electronic stylesheets, and DSSSL-Lite will provide another (industry-standard) language for writing them. Jon Stenerson (TCISoft) recently (February 1995) hosted a workshop on stylesheets. As of August, 1996: Microsoft and some other developers were supporting the "cascading style sheets" approach (CSS1) for WWW/HTML documents. A few starter links:

Cascading Style Sheets
HTML Style Sheets: http://www.w3.org/hypertext/WWW/Style/
Frame-based layout via Style Sheets (Bert Bos, et al.)
W3C Style Sheets Activities
Cascading Style Sheets
DSSSL Online Presentation, by Jon Bosak, from Sun Microsystems: an overview of dsssl-o as well as a comparison between Cascading Style Sheets and DSSSL Online ["CSS is a nice fit for HTML but is utterly inadequate to the needs of commercial content providers using generic SGML. . . inherently incapable of handling generic SGML applications"]; see the text of a sample slide from the presentation, and a longer list of DSSSL features which are missing in CSS(1). See also the following reference.
"Limitations of CSS in complex applications" from a WWW6 presentation "Overview: XML, HTML, and all that," by Jon Bosak. See the full document.
Quick Reference to Cascading Style Sheets, level 1
Web Style Sheets - File on "HTML" style sheets from CERN, with mirror copy [March 2, 1995] here
W3 announcement for Web style sheets (March 1996) [mirror copy].
article by Bert Bos or perhaps mirror copy
Michael Sperberg-McQueen and Robert Goldstein, "HTML to the Max: A Manifesto for Adding SGML Intelligence to the World-Wide Web'; see bibliographic entry
Other current W3C working drafts
Cascading Style Sheets: draft specification (Håkon W. Lie and Bert Bos); [mirror copy, partial links]
Cascading Style Sheets, level 1,W3C Working Draft 26-July-96
Microsoft supports CSS1 [August 21, 1996]
Style Sheet analysis report in the context of the DMU-JEDI project, by A. Gartland and D. Houghton; March 1996. Also in Postscript or RTF or text formats. [mirror copy, text version]

SGML and HTML

[CR: 19971009] [Table of Contents]

The migration of HTML to well-formed SGML has been something of a rocky road. A few relevant and possibly promising links documenting (the history of) this evolution:

XML - Extensible Markup Language. See the dedicated database section, or the W3C statement, "Generic SGML over the Web". On XML, note specifically updated information on the work of the SGML Editorial Review Board: "The W3C has formed an SGML Working Group made up of SGML experts and an SGML Editorial Review Board made up of SGML experts who also have special standards and implementation experience to coordinate with existing related standards efforts and to provide specifications where needed to form a complete SGML Internet solution. Specific deliverables under development by the SGML WG/ERB include: (1) A specification for a simplified version of SGML suitable for Internet applications. Target delivery: draft by the SGML 96 Conference (November 1996); (2) A specification of standard hypertext mechanisms for SGML applications. Target delivery: draft by the WWW6 Conference (April 1997); (3) Public text and extensions needed to apply the DSSSL stylesheet language (ISO/IEC 10179) to Web browsers. Target delivery: draft by the SGML 97 Conference (December 1997)."
ISO-HTML - ISO Hypertext Markup Language, alias Standard HTML. "ISO/IEC International Standard for the HyperText Markup Language (HTML)," being developed (1996-1997) under the auspices of ISO/IEC JTC1/SC18/WG8. Documented in the Committee Draft of ISO 15445 ISO-HTML [summer 1997]. Rationale: "While HTML has grown rapidly, many of the latest features are of minimal use to those who desire a simple, stable format for presenting documents on the WWW. The working group that developed the early versions of HTML is being disbanded, and a new group is needed to maintain basic functionality. ISO/IEC JTC1/SC18/WG8, as the developers of SGML, has the necessary knowledge and skills." See also the ISO-HTML link collection.
[September 20, 1997] "SunSoft - SHML 1.0 Development Version, code-named 'Mehitabel' Document Type Definition for the HyperText Markup Language." [Status Report 'alpha']. Description: "Project Mehitabel is a next-generation, modular HTML DTD (labelled 'SHML 1.0') based on HTML 3.2, but with changes/improvements as described below. Mehitabel may serve many needs: authoring, document design, validation, etc. It is not meant to capture the complete array of current Web features, rather it is meant as a good, structured DTD for document authoring, with extensibility to allow for additional features. It would also be a suitable place to scrounge for HTML-based XML components, with necessary modifications. SHML is currently the basis for HTML inclusions in various prototype document types." The effort is directed by Murray Altheim.
SGML Open recommendations on HTML 3.0 (Posting #1, March 20, 1995)
WWW-TEI Meeting at Cork on local Web server, (or via Cork: WWW-TEI Meeting 19-20th November 1993
SGML on the Web
SGML and the Web: http://www.w3.org/hypertext/WWW/MarkUp/SGML/
Sperberg-McQueen and Goldstein, "HTML to the Max: A Manifesto for Adding SGML Intelligence to the World-Wide Web'; see the bibliographic entry for details and availability. The paper was presented at the Second International World Wide Web (WWW) Conference '94: Mosaic and the Web.
Sperberg UIC: Index of /~cmsmcq/ [on SGML-WWW stylesheet mechanisms]
Eric D. Freese, "The Transformation of SGML Documents for Presentation on the World Wide Web", also in mirror copy here. The paper was presented at the Second International World Wide Web (WWW) Conference '94: Mosaic and the Web.
Eliot Kimber's letter to the editor, WIRED, with evaluation of HTML/SGML
Spyglass Technical Reference ("SGML-aware")
Steve DeRose on HTML(1) and/versus SGML
Lou Burnard's review of HTML(1) and TEI, posted to TEI-L
Dan Connolly: A Lexical Analyzer for HTML and Basic SGML (1995/10/18, mirror)
Eric Severson's white paper entitled "How SGML and HTML Really Fit Together: A Case for the A Scalable HTML". See the bibliographic entry for an online copy and location in the Newswire archives
Hypertext links in HTML, by Murray Maloney (primitive relationships and link types); [mirror copy]
HTML 2.0 Proposed Standard Materials
Inserting objects into HTML
HTML 2.0 DTD in Earl Hood's analysis
The "latest" draft version of the HTML 3.0 DTD, gratis Dave Raggett
W3 HyperText Markup Language (HTML): Working and Background Materials
HaL HTML Validation Service
"SGML vs. HTML vs. PDF", presented by Tim Bray on Wednesday, August 21, 1996 at Interlab '96 (Third Department of Energy (DOE) Laboratory Internet development workshop, held August 19-21, 1996, Denver, Colorado
Using HyTime to Link TranslationsVersion 1.0, 6th December 1995, by Martin Bryan; [mirror copy]
A collection of links (SGML/HTML) created by Terry Allen
A book on HTML by someone who understands the importance of SGML: Peter Flynn, The WorldWideWeb Handbook: An HTML Guide for Users, Authors and Publishers, by International Thomson Computer Press. ISBN 1-85032-205-8. See a description on Peter Flynn's WWW page, or connect to Thomson's WWW server and use their searchable catalogue.
htmlchek. "htmlchek.awk, htmlchek.pl - Syntactically checks HTML 2.0 or 3.0 files for a number of possible errors; can do local link cross-reference checking, and generate a rudimentary reference-dependency map. Runs under awk or perl. Includes a number of supplemental utilities for HTML file processing." Version 4.1, February 20, 1995. By Henry Churchyard (churchh@uts.cc.utexas.edu). See also availability via FTP
Wilbur HTML 3.2; [mirror copy]
Wilbur 3.2 Spec
Cougar DTD "-//W3C//DTD HTML Experimental 19960712//EN" [July12, 1996] "...style sheets, scripting, the object tag, internationalization and some extensions to forms. The frame tags will probably be added once we have an agreed definition for them" [mirror copy]
WebTechs Page for Links

SGML/XML and TeX

[CR: 19990401]

Much has been written clarifying -- as well as obfuscating -- the relationship of SGML and (La)TeX, and describing the use of SGML and TeX together in document production. The SGML/XML Web Page contains a separate document with a brief explanation of the relationship, as well as pointers to essential information: (a) links to publicly accesible software that makes SGML documents printable via (La)TeX; (b) a substantial a bibliographic reference list germane to "SGML and TeX"; (c) links to other resources on (SGML-)TeX which are more current. Anyone aware of good additions to this bibliographic list is invited to contact me with the relevant information. References are not [yet] complete for publications after 1995.

SGML/XML and Math

[CR: 20000921] [Table of Contents]

Links [provisional only; someone who actually works with math should volunteer to maintain a page of links ;-) ]:

Mathematical Markup Language. MathML is a W3C specification defining an "XML application for describing mathematical notation and capturing both its structure and content. . ."
Notes on SGML Math Workshop (Held as part of the University of Illinois Digital Library Initiative (UIUC DLI), University of Illinois at Urbana-Champaign, May 1, 1996. By Evan Owens. [mirror copy]
"OpenMath Standard"
See: "OMDoc: A Standard for Mathematical Documents." The OMDoc format is an extension of OpenMath standard.
[August 26, 2000] The Hypertextual Electronic Library of Mathematics. "The purpose of the project is the development of a suitable technology for the creation and maintenance of a virtual, distributed, hypertextual library of structured mathematical knowledge, eventually passing through the eXtensible Markup Language. The final aim is to allow mathematical documents to be served, received, and processed on the Web, just as HTML has enabled this functionality for text. . . HELM aims at the definition of a layer of DTDs specification and a RDF model for structured mathematical knowledge, suited to the creation and maintenance of a distributed, hypertextual, electronic library of mathematics. We shall not develop the library itself (that would obviously require the contribution of hundreds of people), except for a small subset for validation and demonstration purposes. On the technological side, we shall develop all the infrastructure required for the consultation of the library via a usual www-browser (including navigation and searching facilities). On the other side we shall interface some current proof assistants with the XML layer, mainly for authoring purposes. In other words, these applications will be able to produce XML documents conforming to the DTD (and read them back, whenever defined according to a suitable subset). . ."
[August 26, 2000] "GtkMathView: a GTK Widget to Render MathML Documents." From Luca Padovani. "GtkMathView is a C++ rendering engine for MathML (see http://www.w3.org/Math) documents. This module is meant to be part of the Helm project, but the widget is (hopefully) independent from it and easily embeddable within GTK applications. This library is covered by the GNU General Public Licence -- not the GNU LGPL; this means that the library cannot be linked against commercial applications. The widget is designed to use the best fonts available on the system. As of today the font configuration file (fonts.xml) is designed to use the fonts that comes with the X distribution, in particular 'times' for plain text and 'symbol' for mathematical symbols. In addition, the configuration file includes the description of fonts coming along with Mathematica, since they are freely available and range over a larger set of symbols. . ."
SGML and the Semantic Representation of Mathematics, by Roy Pike, King's College, Strand, London, U.K. With 11 references. [mirror copy]
Maths DTD [for ISO 12083], by E. R. Pike. May 8, 1996. [mirror copy]
[Alternate source for the semantics-based DTD for Mathematics], [mirror copy]
An SGML-Math discussion forum is related to the formative ISO 12083 DTDs; perhaps defunct
SC4 WG6 Workitem "Mathematics"
"On using SGML for mathematical formulas": Discussion between Franck Laloe (chairman of the publications comittee of the European Physical Society) and Eric van Herwijnen; [mirror copy]
Rendering math equations from DVI, via conversion to GIF on the fly for HTML browsers, from SGML source: see DynaBase 3.1 description. An article: "Inso Adds Math to DynaWeb. IEEE Uses it to Go Live with Online Digital Library."
Article "Standard DTDs and Scientific Publishing," by N. A. F. M. Poppelier, Eric van Herwijnen, and C. A. Rowley, published in EPSIG News 5/3 (September 1992) 10-19. Available at the URL ftp://ftp.elsevier.nl/pub/sgml/epsig_news_article.ps, in Postscript format, from Elsevier [101422 bytes, Aug 24 11:13]. Or: the local mirror copy.
Collection of [81] postings on "Setting Mathematics with SGML" (December 1995 - May 1996)
Math-SGML research in the context of the (TIDE) "MATHS Project" (EUROMATH DTD)
"About the Euromath Editor"; [mirror copy]
ISO12083:1993 Document Type Definition for Mathematics
[August 14, 1998] OpenMath: A proposal for an extendable semantic encoding for mathematics. OpenMath is currently being developed by the Esprit OpenMath Project and the The North American OpenMath Initiative (NAOMI).
Main entry for the Euromath Project
HTML and Math Markup
"The ISO 12083 Mathematics Fragment,", by Stephen Buswell, etc - SGML '96 presentation
EPSIG Meeting: Minutes. Barcelona, May 12, 1997. A significant proportion of energy for ISO 12083 revision/enhancement is related to math encodings. [local archive copy]
"...a package of mathematical SGML and DSSSL stuff", from Chris Maden
[August 14, 1998] Note from Eitan Gurari: work done to configure TeX4ht for XML and MathML. See: http://www.cis.ohio-state.edu/~gurari/temp/xml/ml.html; [local archive copy] Also, the collected documentation.
[September 25, 1998] WebEQ - a suite of Java programs for creating and displaying interactive scientific Web documents. Use WebTeX or MathML source code.

Receive updates from Managing Editor, Robin Cover.

Document URI: http://xml.coverpages.org/topics.html — Legal stuff
Robin Cover, Editor: robin@oasis-open.org