The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
Advanced Search
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

Cover Stories
Articles & Papers
Press Releases

XML Query

XML Applications
General Apps
Government Apps
Academic Apps

Technology and Society
Tech Topics
Related Standards
Last modified: June 18, 2003
SGML/XML Special Topics

SGML: Special Topics


Several prominent grammar issues are referenced here, but the best access to expert commentary on arbitrary topics or clauses in the Standard (on the Internet) is through searching the archives of the Usenet News group comp.text.sgml. The technical commentary of Erik Naggum on ISO 8879:1986 in the archives of comp.text.sgml, in terms high quality and sheer mass, is especially valuable.

SGML Declaration

[CR: 19990622]

The SGML declaration establishes the "lexical" basis for an SGML document, including the character sets, markup delimiters, features, and other options. It governs both the DTD (document type definition) and the document instance. A few pointers to resources follow.

Productions: the grammar productions from ISO 8879

[CR: 19970924]

Definitions from clause 4 of ISO 8879

[CR: 19981016]

For the benefit of those who temporarily have been separated from their personal annotated copy of the Standard, and need precise definitions: list ISO 8879 Definitions, from Clause 4 - hypertext overview. "All definitions listed as found in clause 4. . . all defined terms used in the definition are listed below the definition text." [See the parent node for related links.] Based upon the work of Arjan Loeffen.

A similar online SGML Glossary is also available from Pindar, based upon the work of Neil Bradley in his book The Concise SGML Companion. This resource is also available on an alternate Web site; see SGML Dictionary.

Michael Sperberg-McQueen's Grammar Tools

[CR: 19970529]

See the ftp directory and subdirectories: bison, carthage, dpp, iso8879.

  • Bison tools: "The subdirectory pub/tei/sgml/grammar/bison contains files with Bison grammars and Flex scanners for SGML document type definitions (p2bnf.y and p2bnflex.l), SGML document instances (p2doc.y and p2doclex.l), and SGML declarations (p2decl.y and p2declex.l). The grammars have been parsed with Bison to ensure that the grammars are clean, but the semantic actions do no useful work." [from the README file]
  • ISO 8879 grammar: The subdirectory pub/tei/sgml/grammar/iso8879 contains transcriptions of the full formal grammar of ISO 8879 (the SGML standard), either in numeric sequence of the productions in the standard (sgmlfull.syntax) or arranged in groups, so closely related productions are more readily found together, with dependency trees showing which productions depend on which others. [from the README file]
  • Carthage: "Carthage is a yacc/lex-based parser for SGML DTDs which can delete references to undeclared elements. It can also do a few other things, depending on the run-time flags you give it." Some options include: (1) dropping or keeping marked sections; (2) warning if entities are declared twice; (3) dropping or keeping parameter entity declarations; (4) deleting named GIs from content models; (5) listing of specified classes of elements in the DTD [used, unused, default undeclared, declared]; (6) dropping or keeping comments in the output file, etc. [extracts from the README file, dated June 17, 1996.
  • dpp (DTD pre-processor parser): "DPP is a parser for SGML document type declarations, intended for use as a front end for filters which modify DTDs (e.g. filters to expand all or some parameter entity references, or to rename elements, etc.). Since DPP uses the same output format as sgmls. . .many existing tools for writing filters for SGML document instance . . . can be used with DPP to make filters for DTDs" [from the doc]. See: FTP directory:, or: the tar-gzipped package.
  • FTP to the relevant directory at UIC:
  • See the README file (mirror copy, December 1995)
  • The SGML grammar descriptions are also available via mail from the UICVM LISTSERVer and on other FTP other sites: (filenames: memo.syntax, etc.)

SGML/XML DTD/Grammar Transduction and Generation

Some papers on grammar generation and supporting software tools are referenced in a separate document, "SGML/XML DTD Transduction and Generation."

SGML/XML Notion of Ambiguity (non-deterministic content models)

[CR: 20010619]

  • Several technical papers on ambiguity by Anne Brüggemann-Klein [and Derick Wood] are listed in the SGML bibliography. See for example: the dissertation of Anne Brüggemann-Klein, Formal Models in Document Processing (1993) and "One-Unambiguous Regular Languages" (1998).
  • A (somewhat) dated short list of published references
  • See Arjan Loeffen's CTS contribution above
  • Discussion on XML-DEV from June, 2001. Compares SGML/XML notion of deterministic model vs. that of RELAX NG.
  • Discussion from August 1997. Joe English: ". . . The real show-stopper though is probably start-tag omission. Without the [ISO 8879's particular] ambiguity restriction, the formal definition of 'contextually required element' and related terms are utterly meaningless. Specifically, the parts about 'satisfied tokens': in terms of general regular expressions it doesn't make any sense to talk about which subexpression an input token matches, only whether the input sequence as a whole matches the regular expression as a whole. . ."
  • [January 14, 2001] XML and non-deterministic ['ambiguous'] content models. Jim Shain asked a question about (non-)determinism in XML content models; informative responses from Richard Tobin, James Clark, Deborah Aleyne Lapeyre, and Marcus Carr were provided.
  • [January 14, 2001] XML parser response to non-deterministic content models. TAKAHASHI Hideo said "I understand that the XML 1.0 spec prohibits non-deterministic (or, ambiguous) content models (for compatibility, to be precise)" it this so?... Joe English and James Clark answer "no," and explain.
  • Note by Anne Buüggemann-Klein
  • Note by Erik Naggum
  • "SGML & XML Content Models." By Pekka Kilpeläinen. Report C-1998-12, Department of Computer Science, University of Helsinki, May 1998. 16 pages. URL: . "The SGML and XML standards use a variation of regular expressions called content models for modeling the markup structures of document elements. SGML content models may include so called and groups, which are excluded from XML. An and group, which is a sequence of subexpressions separated by an &-operator, denotes the sequential catenation of its subexpressions in any possible order. If one wants to shift from SGML to XML in document production, one has to translate SGML content models to corresponding XML content models. The allowed content models in both SGML and XML are restricted by a requirement of determinism, which means that a parser recognizing document element contents has to be able to decide without lookahead, which content model token to match with the current input token, while processing the document from left to right. It is known that not all SGML content models can be expressed as an equivalent XML content model. It is also known that transforming an SGML content model into an equivalent XML content model may cause an exponential growth in the length of the content model. We discuss methods of eliminating and groups and analyze the circumstances where they can be applied. We derive a tight bound of e n! on the number of symbols in the result of eliminating an and group of n symbols, where e = 2.71828... is the base of natural logarithms. We present the analysis in a pedagogical manner, emphasizing mathematical methods which are typical to the analysis of algorithms. We also show that minimal deterministic automata for recognizing an and group of n distinct element names contain 2n states and n 2n-1 transitions, excluding the failure state and transitions leading to it..." [cache]
  • [January 14, 2001] "How to validate XML." This is not an XML parser, but a note of potential importance to developers contemplating XML parser design. From Joe English. "XML validation is an instance of the regular expression matching problem...The most commonly-used technique to solve this problem is based on finite automata. There is another algorithm, based on derivatives of regular expressions, which deserves to be more widely known..."
  • Content Model Algebra. By Sam Wilmott (OmniMark Technologies). "Anyone desiring to have a full understanding of SGML content models and the theory behind the construction of text markup languages should have some familiarity with the basic concepts of Automata Theory and Set Theory on which content models are based. Automata Theory is a branch of Computer Science to which some, but not all, Computer Science students are exposed. Set Theory is a branch of Mathematics to which grade-school students were exposed, at least in the days of "New Math". This report provides an outline of some of the concepts of Automata Theory and Set Theory relevant to creating text markup languages. A background in formal Computer Science theory and Mathematics is required for an easy comprehension of the material in the paper. Other readers may find it interesting that there is a strong theoretical basis for what they are doing when they write content models..." URLs: try perhaps, or local crippled version.
  • Use indexed archives for comp.text.sgml wherever you can find them: for example, Arjan Loeffen's indexed shadow archive. Many CTS discussions have been held on parsing in light of SGML's definition of "ambiguity"

RS/RE Processing

[CR: 19961118]

Use (and Non-use) of Exceptions in DTDs

[CR: 20020309]

  • Joe English (August 18, 1997) - on why "it's not possible to create an exclusion-free DTD exactly equivalent" [to one having inclusion exceptions]
  • [March 09, 2002] "Complexity of Context-Free Grammars with Exceptions and the Inadequacy of Grammars as Models for XML and SGML." By Romeo Rizzi (Facoltà di Scienze, Dipartimento di Informatica e Telecomunicazioni, Università degli Studi di Trento). In Markup Languages: Theory & Practice 3/1 (Winter 2001), pages 107-116 (with 19 references). "The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow authors to better transmit the semantics in their documents by explicitly specifying the relevant structures in a document or class of documents by means of document type definitions (DTDs). Several authors have proposed to regard DTDs as extended context-free grammars expressed in a notation similar to extended Backus-Naur form. In addition, the SGML standard allows the semantics of content models (the right-hand side of productions) to be modified by exceptions. Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. Since XML does not allow exceptions, the problem of exception removal has received much interest recently. Motivated by this, Kilpeläinen and Wood have proved that exceptions do not increase the expressive power of extended context-free grammars and that for each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. Since their argument was based on an exponential simulation, they also conjectured that an exponential blow-up in the size of the grammar is a necessary devil when purging exceptions away. We prove their conjecture under the most realistic assumption that NP-complete problems do not admit non-uniform polynomial-time algorithms. Kilpeläinen and Wood also asked whether the parsing problem for extended context-free grammars with exceptions admits efficient algorithmic solution. We show the NP-completeness of the very basic problem: given a string w and a context-free grammar G (not even extended) with exclusion exceptions (no inclusion exceptions needed), decide whether w belongs to the language generated by G . Our results and arguments point up the limitations of using extended context-free grammars as a model of SGML, especially when one is interested in understanding issues related to exceptions." A related paper was published as IRST Technical Report 0101-05, Istituto Trentino di Cultura, January 2001 (December 2000: Centro per La Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura). See the original Postscript and the online abstract. [cache]
  • [October 31, 2001] "SGML and XML Document Grammars and Exceptions." By Pekka Kilpeläinen and Derick Wood. In Information and Computation Volume 169, Number 2 (September 2001), pages 230-251 (with 19 references). "The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow users to define document-type definitions (DTDs), which are essentially extended context-free grammars expressed in a notation that is similar to extended Backus-Naur form. The right-hand side of a production, called a content model, is both an extended and a restricted regular expression. The semantics of content models for SGML DTDs can be modified by exceptions (XML does not allow exceptions). Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. We give precise definitions of the semantics of exceptions, and prove that they do not increase the expressive power of SGML DTDs when we restrict DTDs according to accepted SGML practice. We prove the following results: (1) Exceptions do not increase the expressive power of extended context-free grammars. (2) For each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. (3) For each DTD with exceptions, we can construct a structurally equivalent DTD when we restrict the DTD to adhere to accepted SGML practice, (4) Exceptions are a powerful shorthand notation-eliminating them may cause exponential growth in the size of an extended context-free grammar or of a DTD." See the Technical Report for a related version.
  • [January 23, 1998] SGML Exceptions and XML," by Eve Maler (ArborText) - "briefly describes SGML exceptions (inclusions and exclusions) and discusses how 'exception users' can handle their DTDs and data in XML, which does not allow exceptions"; [local archive copy]
  • "A New Generation of Tools for SGML." By Richard W Matzen. In Markup Languages: Theory & Practice 1/1 (Winter 1999) 47-74 (with 21 references). "Exceptions are used in many standard DTDs, including HTML, because they add expressive power for DTD authors. However, there is a tradeoff: although they are useful, exceptions add significantly to the complexity of DTDs. Authoring DTDs is a difficult task, and existing tools are of limited use because of the lack of a suitable formal model for exceptions. This paper describes methods for constructing a static model that completely and precisely describes DTDs with exceptions. . ."
  • "A New Tool for SGML with Applications for the World Wide Web. By Matzen and Hedrick. Paper presented at SAC '98 - 1998 ACM Symposium on Applied Computing. February 27 - March 1, 1998, Marriott Marquis, Atlanta, Georgia, U. S. A. See the associated SGML exceptions analysis (results): The results shown in [this results set] are from running the prototype software tool described in the above paper. And see the authors' paper, "Unraveling Exceptions," Conference Proceedings: SGML/XML 97, Washington D.C., December, 1997.
  • Eliot Kimber (April 20, 1996)
  • Michael Sperberg-McQueen (April 15, 1996)
  • Collection of postings from CTS, December 1994
  • Erik Naggum (August 1994)
  • Erik Naggum (1994)
  • Erik Naggum (August 1994)
  • Marcy Thompson (August 1994)
  • Len Bullard (August 1994)
  • Other links from Arjan Loeffen's indexed CTS database (or: see the complete subject listing)

CDATA [and RCDATA] as Declared Content

[CR: 19970905]

Duplicate tokens in an attribute definition list (FAQ)

[CR: 19970925]

A short document explaining why name tokens cannot be duplicated in an attribute definition list, even within different groups. Updated September 25, 1997. Thus, it explains why the following is not allowed:

<!ATTLIST candidate constantlyChangesPosition (YES | NO) YES
                    liesWithoutFlinching      (YES | NO) YES >

It also explains the rationale for the particular design (limitation) in SGML, offers some work-arounds, and speculates on whether/how the infelicity might be addressed in a revision to SGML. As of August 1997, (WG8 N1929) it appears that the restriction, in part, will be eliminated. Your contributions to this compilation are welcome.

SGML/XML and Context Free Grammar [SGML, CFG, BNF],

[CR: 19970523]

SGML/XML and Forest/Hedge Automata Theory

[CR: 20000224]

A separate document contains a collection of references on Tree-Regular Languages, Forest-Regular Languages, DTD/Document Transformation, Forest/Tree Automata Theory, Schema Languages, and related matters. Most of the publications below are written by Murata Makoto and Paul Prescod; the corresponding bibliographic entries contain document abstracts. As of early 1999, 'forest automata' are being referred to as 'hedge automata' in the context of SGML/XML schemas.


[CR: 19981003]

Some SGML experts don't think the SUBDOC feature of SGML is worth much, at least as defined in ISO 8879:1986. See the CTS archives for discussion. Eliot Kimber, on the other hand, is a champion of SUBDOC. See the links below.

  • SUBDOC and Architectures. Kimber, April 1997. The post answers two commmon questions, to start with: (1) "When the parser encounters a SUBDOC entity reference, it doesn't parse the entity. Why not?"; (2) "As each subdocument can have its own DTD, won't the authors be able to get around our DTD by using subdocuments?"
  • "Re-Usable SGML: A Plea for SUBDOC", by Eliot Kimber (poster session first presented at SGML '95); [mirror copy, text and partial links only]
  • [October 03, 1998] "The Challenge of Implementing SUBDOC: With Some HyTime Support." Presented by Erlend Øverby, University of Oslo (Norway) at Markup Technologies '98. "At the University of Oslo, we have been working with SGML since 1992. Currently, we have over 120 authors maintaining approximately 1000 SGML files. We have found the SUBDOC feature a useful way to manage SGML materials that must be edited as freestanding units, but that are combined, for publication, with other materials. SUBDOC simplifies our use of marked sections for conditional text, and eliminates many ID/IDREF name conflicts. Simple HyTime linking helps us manage cross-references between subdocuments effectively."
  • [June 18, 1997 update] Note on several recent developments in industry and standards which make the use of SUBDOC more likely. See:
  • [June 24, 1997] "Cool Tool: Value Reference (Subdoc) Resolution for DSSSL and JADE," by Eliot Kimber. "I wrote a DSSSL spec, using the JADE SGML back end, that produces a new instance from a compound document. This new instance can then be formatted normally. This spec is available from the ISOGEN Web site at "" ("value reference" being the new HyTime facility that reflects the semantics of SUBDOC reference, among other things). See the context for discussion of this 'cool tool'.
  • Eliot Kimber on SUBDOC

Other Grammar/Parsing Issues, and FEATURES

[CR: 19981216]

Nothing here (or anywhere) should be taken as gospel: consult your local SGML expert for the latest word. [**Need to create other subsections for topics; need a dedicated FAQ**]

Architectural Forms and SGML/XML Architectures

[CR: 20010505] [Table of Contents]

Information on architectural forms and AF processing is provided in a separate document.

Groves, Grove Plans, and Property Sets in SGML/XML/DSSSL/HyTime

[CR: 20000419] [Table of Contents]

GROVE - "Graph Representation of Property Values." Description and references are provided in a separate document.

ESIS - ISO 8879 Element Structure Information Set

[CR: 19970622]

"The set of information that is acted upon by implementations of structure-controlled applications is called the 'element structure information set' (ESIS). ESIS is implicit in ISO 8879, but is not defined there explicitly." [from WG8 N931 Attachment, as documented below]


  • See The SGML Handbook, by Charles F. Goldfarb, pages 588-593 ('ESIS' is not in the volume index). The description of ESIS presented there is within Appendix B, "ISO/IEC JTC1/SC18/WG8 N1035: Recommendations for a Possible Revision of ISO 8879." The ESIS description is supplied as an attachment ("Attachment 1: The ISO 8879 Element Structure Information Set (ESIS)."
  • ESIS description from WG8 N931, attachment
  • Pointers to ESIS description, from Jacques Deseyne
  • Other postings on ESIS [how to learn about it], by Ingo Macherius, T. Kurt Bond, and Jacco van Ossenbruggen
  • ESIS described in an attachment to N1035; [its mirror copy]
  • ESIS - Standard Generalized Markup Language (SGML) Property Set (relevant modules), from Peter Newcomb:

ISO 8879: Character Sets and Multilingual Text, including Extended Reference Concrete Syntaxes (ERCS)

[CR: 20000425]

Below is a list of links to resources that may be relevant to multilingual text and character sets used in SGML documents.

Conformance: SGML/XML/HTML

[CR: 19971107]

See Charles Goldfarb's SGML Handbook for discussion of conformance -- an aspect of SGML that is highly controversial. Exoterica Corporation distributes a CD-ROM with many test suites; see The Compleat SGML in the main bibliography. See also the bibliographic entry for ISO 13673 Conformance Testing for SGML Systems [project editor Lynne Price]. The CTI-SEMA: SGML Test Materials from GCARI are available from several sites:

ISO 8879: DTDS (Repositories of document type declarations/definitions conforming to ISO 8879)

[CR: 19980508] [Table of Contents]

Most of the SGML/XML case studies presented in the major sections for applications (General, Academic, Industry) reference the relevant DTDs. Some of the more common "standard" DTDs are referenced in the documents cited below. "Standard" DTDs, the reader should understand, are often modified locally in order to support slightly different conceptual models, and to accommodate particular processing needs. "Standard" DTDs that do not change to meet the developing requirements of users tend to become fossils.

SGML/XML Entity Sets and Entity Management

[CR: 20030408] [Table of Contents]

  • ISO entity sets: explained by Rick Jelliffe
  • XML Characater Entities. "An XML entity set corresponding to ISO 8879 SGML character entities." DocBook (Norm Walsh). Version 0.3 as of 13-June-2003.
  • "Community Contribution: Problems with ISO Entity Sets (ISO 8879 and 9573-13)." By David Carlisle. Reference: ISO/IEC JTC 1/SC34 N0387. "Real-world use of ISO entity sets has revealed issues that need to be addressed for the user community... This document is a personal note on problems encountered whilst working with ISO entity sets, and in particular in attempting to produce mappings of the ISO entities to Unicode to enable XML DTD declarations to be made. Much of this work has been undertaken as the Editor of the W3C MathML DTD, but this is a personal note, produced in response to an email request quoted at the end. MathML Character Descriptions contains further information including mappings to Unicode for all the ISO 8859 and 9573-13 entity sets, whether or not they are used in MathML. Docbook has a similar page describing its mappings...
  • [March 29, 2005] Information technology — SGML support facilities — Techniques for using SGML. Part 13: Public Entity Sets for Mathematics and Sciences. Edited by Martin Bryan David Carlisle. 2003-11-26. Project: PDTR 9573-13:2004 (type 3) 2nd Edition. Project Editor: Dr. David P. Carlisle. PDTR Ballot version, 2005-02-24. See also the reference page [SO/IEC JTC 1/SC 34 N0599] and ballot [Ballot due 2005-05-24 PDTR 9573-13: Maths and scientific character sets]. "Tens of thousands of graphic characters are used in publishing text, a large proportion of which have been defined in ISO/IEC 10646. Even where standard coded representations exist, however, there may be situations in which they cannot be keyboarded conveniently or accurately, or in which it is not possible to display the desired visual depiction of the characters. To help overcome these barriers to the successful interchange of SGML and related documents, this part of ISO/IEC TR 9573 defines character entity sets for some widely used special graphic characters regularly used in the production of scientific and mathematical documents. [Note 1: Entity repertoires are necessarily larger and more repetitious than character sets, as they deal in general with higher-level constructs. For example, unique entities have been defined for each accented Latin alphabetic character, while a character set might represent such characters as combinations of letters and diacritical mark characters.] In many instances upper- and lower-case is used to differentiate the names of entities. It is assumed that any SGML concrete syntax used in conjunction with these entity names will be case sensitive. [Note 2: The reference concrete syntax defined in ISO/IEC 8879 (SGML) is case sensitive.] This edition of the standard has been aligned with the Unicode 3.2 updates to ISO/IEC 10646:2000, as covered by Amendment 1 to the standard. For the purposes of backwards compatibility the names assigned to the characters in the original edition of the standard are shown before those assigned to the character in ISO/IEC 10646. References to characters in this part should, however, refer to the ISO/IEC 10646 name rather than the name originally assigned by ISO/IEC TR 9573..." Source: PDF.
  • [February 05, 2002] "XML Character Entities Version 0.2." Edited by Norman Walsh for the OASIS DocBook Technical Committee. Working Draft 04-February-2002. This Standard defines XML encodings of the 19 standard character entity sets defined in Non-normative Annex D of ISO 8879:1986 (ISO 8879:1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML). 1986. [Caveats: a working draft constructed by the editor; not yet an official committee work product, and may not reflect the consensus opinion of the committee; a few acknowledged discrepancies WRT the character mappings in this draft.] "This Standard defines XML encodings of the standard SGML character entity sets. Non-normative Annex D of [ISO 8879:1986] defines 19 standard SGML character entity sets: Added Latin 1, Added Latin 2, Greek Letters, Monotoniko Greek, Russian Cyrillic, Non-Russian Cyrillic, Numeric and Special Graphic, Diacritical Marks, Publishing, Box and Line Drawing, General Technical, Greek Symbols, Alternative Greek Symbols, Added Math Symbols: Ordinary, Added Math Symbols: Binary Operators, Added Math Symbols: Relations, Added Math Symbols: Negated Relations, Added Math Symbols: Arrow Relations, Added Math Symbols: Delimiters. The SGML declarations for these entities use the specific character data (SDATA) entity type that is not supported in XML, so alternative XML declarations are necessary. In XML, the specific character data of most entities can be expressed as a Unicode character." In addition to the character entity sets, the document defines 'XML Character Elements'. Design rationale: "Named XML entities (except for the five predefined entities) cannot be used if they are not declared. Entity declaration requires either an external or an internal subset. Some classes of applications forbid the occurrence of markup declarations in documents. For these documents, named character entities are inaccessible...we [therefore] introduce an XML vocabulary with the semantics of character entity reference. This Standard defines the semantics of elements and attributes declared in the namespace. This namespace contains exactly one element, char. The char element has two attributes, entity and name. They are mutually exclusive. The entity attribute identifies characters by their character entity names. (The set of valid names is the closed set of names associated with character entity sets defined by this Standard.) Case is significant in entity names. The name attribute identifies characters by their Unicode character names..." See the version of March 19, 2002 [and later].
  • Nineteen ISO 8879:1986 character entity sets: Added Math Symbols: Arrow Relations, Added Math Symbols: Binary Operators, Added Math Symbols: Delimiters, Added Math Symbols: Negated Relations, Added Math Symbols: Ordinary, Added Math Symbols: Relations, Box and Line Drawing, Russian Cyrillic, Non-Russian Cyrillic, Diacritical Marks, Greek Letters, Monotoniko Greek, Greek Symbols, Alternative Greek Symbols, Added Latin 1, Added Latin 2, Numeric and Special Graphic, Publishing, General Technical. Available as separate disk files in a ZIP archive, or as a single concatenated text file. Source: the SGML Repository.
  • ISO 8879 entity sets in 20 disk files, from the WG8 FTP server, here available concatenated, in text format or in UNIX tar/gzip format. Disk files dated to June 28, 1995.
  • [August 17, 2001] On the production of a set of XML character entities by the OASIS DocBook Technical Committee: Norm Walsh proposed a charter modification to read "... The DocBook TC will develop, and publish as a separate specfication, a set of XML character entities, based on previously published ISO 9573 entity sets (themselves an extension of the ISO 8879 entity sets) for use in the XML version of DocBook and other related XML specifications..." Or possibly: (proposed by Eduardo Gutentag) "The DocBook TC will also develop specifications based on previously published standards for use in the XML version of DocBook and other related XML specifications; the first such specification will be based on previously published ISO 9573 entity sets (themselves an extension of the ISO 8879 entity sets)."
  • [September 20, 1997] "ISO Character Entity Sets", collected and organized by Murray Altheim (Sun Microsystems). Part of an investigation to see "how the i18n draft, Unicode, and the current ISO entity sets may work together in SGML, HTML and XML." Also part of "SunSoft - SHML 1.0 Development Version, code-named "Mehitabel" Document Type Definition for the HyperText Markup Language." In the listings, ".ent files are CDATA numeric character references; .gml are SDATA 'square bracketed' strings; .pen are XML-compatible Unicode numeric character references."
  • TEI Standard Writing System Declarations & Entities. IPA, Arabic, Coptic, Classical Greek, ISO 646 (non-national Subset), ISO 646 (International Reference Version), ISO 8859-1 (-2, -5, -7, -8, -9), ISO 8879:1986 (partial). These entity sets and writing system declaration files are available here a in ZIP archive [December 1996]
  • [March 29, 1999] XML Entity Declarations - Entity Declarations for ISO, HTML and MathML character sets. From David Carlisle, OpenMath project.
  • [September 07, 1998] Rune Mathisen and Vidar Gundersen reported on CTS that they are preparing documentation covering the ISO character entities and their LaTeX equivalents. The effort represents "an attempt to make an overview of the ISO character entities (ISO 8879:1986) and their LaTeX equivalents, and to provide a handy reference to the ISO character entities for non-LaTeX users." The online materials will provide the mappings necessary to translate SGML documents to TeX/LaTeX, and will provide documentation on how the characters look by producing glyphs.
  • [October 09, 1998] On October 8, 1998, Sebastian Rahtz posted an announcement concerning the preparation a new 'catalogue of Unicode positions and their corresponding entity names in various sets, including MathML, and the LaTeX equivalent. Each Unicode character may be mapped to several entities'. The document is presented in XML format. Rahtz credits several other people for advice and earlier work: David Carlisle, Robert Streich, Nico Poppelier, Rune Mathisen, and Vidar Gundersen. [local archive copy, 1998-10-08]
  • [July 26, 1997] Submission of "XML-ixed" ISO 8879 entity sets, by Rick Jelliffe of Allette Systems; the postings were made to the XML Development list. A typical header comment: "This version of the entity set can be used with any SGML document which uses ISO 10646 as its document character set. This includes XML documents and ISO HTML documents. This entity set uses hexadecimal numeric character references." Please report any errors to Rick: The entities mapped to hex are in the following files: ISOgrk4.pen, ISOtech.pen, ISOdia.pen, ISOlat1.pen, ISOgrk1.pen, ISOlat2.pen, ISOgrk2.pen, ISOnum.pen, ISOgrk3.pen, ISOpub.pen. Available in a concatenated file, or archived as separate files in a .ZIP package. Note the disclaimer.
  • Public Text for Entity Sets published in ISO/IEC TR 9573-13:1991 (ZIP archive file with 12 entity sets); or concatenated entity set files, text version, supplied by Anders Berglund (see the main ISO 9573 entry)
  • [June, 1997] SGML Public Entity Sets, Proposals. Sample collections of entities and glyphs (proposed) for potential inclusion into ISO 9573. For: Ugaritic, Old Persian, Glagolitic, Croatian, Buginese, Cherokee, and Gothic Uncials. Developed by Anders Berglund and others. [mirror copy, descriptive text only]
  • Note on 9573 entity sets for chemistry, by Martin Bryan (September 1995)
  • Ancient Greek Character Entities - Derived from TEI Writing System Declaration for Ancient Greek. By James Tauber; [local archive copy]
  • Math Entities - From AMS
  • List of Symbols represented by entities - Springer-Verlag Preview Journals Service; [mirror copy]
  • [November 1997] NCBI SGML entity list; [local archive copy]
  • DocBook DTD Character Entities
  • Entity sets via FTP to Darmstadt
  • Additional named entities for HTML - W3C Working Draft 25-Nov-96 = WD-entities-961125; [mirror copy]
  • See also: the SPREAD entity sets - ISO 10646 (Unicode) characters as SGML SDATA entities

SGML Entity Types, and Entity Management

[CR: 19970820] [Table of Contents]

Catalogs, Formal Public Identifiers, Formal System Identifiers

[CR: 20030618] [Table of Contents]

  • SGML Open Catalogs - Using SGML Open catalogs to generate system identifiers. - Documentation in James Clark's SP distribution. [local archive copy]
  • [June 18, 2003]   OASIS Entity Resolution TC Approves XML Catalogs Specification for Public Review.    Members of the OASIS Entity Resolution Technical Committee have voted to approve the latest revision of the XML Catalogs specification as a Committee Specification and to submit the document for public review. The XML Catalogs specification describes an interoperable method for mapping the information in an XML external identifier into a URI reference for the XML external resource. An entity catalog is defined for this purpose, designed to handle "two simple cases: (1) mapping an external entity's public identifier and/or system identifier to a URI reference; (2) mapping the URI reference of a resource (a namespace name, stylesheet, image, etc.) to another URI reference." Three non-normative appendices provide formal definitions for the XML Catalog, including W3C XML Schema, RELAX NG Grammar, and XML DTD. The OASIS TC was chartered in October 2000 to provide an XML syntax for a simple entity catalog format, as envisioned in an earlier OASIS Technical Resolution. A 30-day public review of the XML Catalogs specification will take place from June 18, 2003 through July 18, 2003 in preparation for consideration of the specification as an OASIS Open standard.

  • [August 14, 2001]   Sun Microsystems Releases Java Classes for XML Entity and URI Resolution.    A posting from Norman Walsh (Sun Microsystems) announces the release of a set of Java classes originally written to implement the OASIS XML Catalogs Committee Specification for SAX entityResolver() and JAXP URIResolver(). These classes "greatly simplify the task of using Catalog files to perform entity resolution. You can use these classes directly 'out of the box' with their applications (such as Xalan and Saxon) or customize them to suit your particular needs. Developers will also be interested in the included JavaDoc API Documentation. The distribution package includes Java classes, JavaDoc API documentation, and step-by-step instructions explaining how to use and customize the resolver components." The Preview Version 0.2 requires JDK 1.2 or later. The package with binaries and sample code is available for download from the Sun XML Developer Connection. [Full context]

  • [July 18, 2001] "XML Catalogs." For the OASIS Entity Resolution TC. Working Draft 16-July-2001. Edited by Norman Walsh (Sun Microsystems). "The requirement that all external identifiers in XML documents must provide a system identifier has unquestionably been of tremendous short-term benefit to the XML community. It has allowed a whole generation of tools to be developed without the added complexity of explicit entity management. However, the interoperability of XML documents has been impeded in several ways by the lack of entity management facilities: (1) External identifiers may require resources that are not always available. For example, a system identifier that points to a resource on another machine may be inaccessible if a network connection is not available. (2) External identifiers may require protocols that are not accessible to all of the vendors' tools on a single computer system. An external identifier that is addressed with the ftp: protocol, for example, is not accessible to a tool that does not support that protocol. (3) It is often convenient to access resources using system identifiers that point to local resources. Exchanging documents that refer to local resources with other systems is problematic at best and impossible at worst. The problems involved with sharing documents, or packages of documents, across multiple systems are large and complex. While there are many important issues involved and a complete solution is beyond the current scope, the OASIS membership agrees upon the enclosed set of conventions to address a useful subset of the complete problem. To address these issues, this Standard defines an entity catalog that maps both external identifiers and arbitrary URI references to URI references..." [cache]
  • [October 12, 2000] OASIS Technical Committee on Entity Resolution. An announcement released by Karl Best (OASIS - Director, Technical Operations) describes a proposed 'Entity Resolution' technical committee, to be formed under the rules of the Technical Committee Process as announced in early October. The new committee would continue work begun under the SGML Open Technical Resolution on Entity Management (entity catalog formats, formal system identifiers, etc.), updating this work to cover XML. "A new OASIS technical committee is being formed. The Entity Resolution TC has been proposed by Lauren Wood, SoftQuad Software Inc.; Norman Walsh, Sun Microsystems; Paul Grosso, Arbortext, Inc.; and John Cowan, Reuters Health. The request for a new TC meets the requirements of the OASIS TC process. . . The objective of the Entity Resolution TC is to provide facilities to address issue A of the OASIS catalog specification (TR 9401). These facilities will take into account new XML features and delete those features of TR 9401 that are only applicable to SGML, as well as those features applicable only to issue B in TR 9401. Deliverables: The Entity Resolution TC will produce a Committee Specification that uses XML syntax and provides a DTD (potentially also an XML Schema) for that syntax. This specification will be ready by August 2001. The Entity Resolution TC intends to submit the Committee Specification as an OASIS Standard after sufficient implementation experience has been gathered." See the TC mailing list archives.
  • Don Stinchfield, Using Catalogs and MIME to Exchange SGML Documents. MIMESGML Working Group, INTERNET-DRAFT. Providence, RI: EBT and MIMESGML Working Group, IETF, 1995. "This draft proposes a standard for exchanging SGML documents over the World Wide Web using catalogs and MIME. This draft extends SGML Open's definition of catalogs. . ." See the bibliographic entry for the full abstract and online availability.
  • System identifiers - Documentation in James Clark's SP
  • A.6 Formal System Identifier Definition Requirements (FSIDR) - from Annex A of the HyTime TC
  • Entity Management. SGML Open Technical Resolution 9401:1997 (Amendment 2 to TR 9401). The SGML Open Catalog and interchange package. Edited by Paul Grosso, Chair, Entity Management Subcommittee. September 10, 1997. [local archive copy, HTML] Or: Postscript, ZIP format, [local archive copy]. Packaged format: HTML, PS, DTD, etc.
  • Earlier version of the SGML Open CATALOG definition: Entity Management. SGML Open Technical Resolution 9401:1995. (Amendment 1 to TR 9401) [mirror copy]
  • [January 02, 2000] XML Namespace Resources. The end of calendar year 2000 saw eruption of (yet) another communal lament about the W3C XML Namespace specification, which fails to meet the expectations of some users. Resulting from the discussion: a number of new proposals for indicating "what a namespace URI should point to." (1) Tim Bray, one of the XML Namespace editors, licensed underground activity for a namespace markup vocabulary that could reference related resources ("it would have to be done low, fast, and under the radar...") and then floated his own suggestion for XNRL (XML Namespace Related-Resource Language). "XML Namespace Related-resource Language (XNRL) is an HTML-based markup language designed to contain a human-readable description of an XML namespace as well as pointers to multiple resources related to that namespace. Examples of such related resources include schemas, stylesheets, human-readable documentation (beyond that contained in the XNRL package) and executable code. XNRL is designed to be suitable for service as the body of a resource returned by deferencing a URI serving as an XML Namespace name. [The draft proposal] defines the syntax and semantics of XNRL, and also serves as an XNRL package for the namespace" (2) Jonathan Borden presented an "XML Namespace Catalog Format." The proposal "defines a format for an XML Namespace Catalog. An XML Namespace Catalog serves as a text description of an XML Namespace and includes links to resources associated with the namespace such as schemata, stylesheets and/or other resources associated with the namespace URI. An XML Catalog may also map Formal Public Identifiers into System Identifiers defined as URI references. An XML Namespace Catalog is designed to be suitable for service as the body of a resource returned by deferencing a URI serving as an XML Namespace name. The XML Namespace Catalog format is an extension of XHTML with a new element named resource. The resource element serves as an XLink to the referenced resource. The resource element represents an XLink with two additional attributes public and content-type which provide for optional formal public identifiers and/or content type specifiers The proposal document defines the syntax and semantics of the XML Namespace Catalog Format, and also serves as an XML Catalog for the namespace The XML Namespace Catalog 1.0 DTD has been produced as an extension of XHTML Basic 1.0." See also the example. (3) Sean B. Palmer presented XNCL (XML Namespace Catalogue Language) as just another hack attempt at producing an XML Namespace Catalogue Language that the people on XML-DEV will find solice in. XNCL is a language intended to be used as a de facto dereferencable resource for namespaces. XNCL uses empty div elements in the Link elements to avoid overloading them, and to allow for family derivations. The specification is an XHTML Family derived from XHTML Basic. It has been modified in the following ways: The content model for the link element has been changed so that it may now include div elements An additional resource element may now be used inside link elements, and they are of content type EMPTY..." Note the posting from Paul Grosso on "resource discovery directory" which (1) references the work of the OASIS Entity Resolution TC, and (2) advocates a separation of concerns, viz., between ER (entity resolution) catalogs and resource discovery (RD) directories. See: "Resource Directory Description Language (RDDL)."

  • [May 04, 2001] NB. See draft 04, May 08, 2001. Proposed URN Namespace for Public Identifiers. A posting from Norman Walsh contains the text of an IETF Network Working Group Internet-Draft which the authors believe "resolves all outstanding issues with respect to the request for a 'publicid' NID." The draft A URN Namespace for Public Identifiers ('draft-urn-publicid-03, May 4, 2001) is authored by Norman Walsh (Sun Microsystems, Inc.), John Cowan (Reuters Health Information), and Paul Grosso (Arbortext, Inc.). The draft "describes a URN namespace that is designed to allow Public Identifiers to be expressed in URI syntax." From the document Introduction: "XML external entities have two identifiers: a system identifier and a public identifier. The system identifier is a URI, by definition, but the public identifier is simply a string. Historically, the system identifier of an external entity has been a local, or system-specific identifier while the public identifier has been a more global, persistent name. Unfortunately, public identifiers do not fit neatly into the existing web architecture because they are not legal URIs. Many new specifications (XSLT, XML Schema, etc.) have the implicit or explicit requirement that all external identifiers be URIs. The purpose of this namespace is to allow public identifiers to be encoded in URNs in a reliable, comparable way. This document describes a scheme for representing public identifiers as URNs by introducing a public identifier namespace, 'publicid'. This namespace specification is for a formal namespace." [Full context]
  • [October 12, 2000] Proposed OASIS Technical Committee on Entity Resolution. An announcement released by Karl Best (OASIS - Director, Technical Operations) describes a proposed 'Entity Resolution' technical committee, to be formed under the rules of the Technical Committee Process as announced in early October. The new committee would continue work begun under the SGML Open Technical Resolution on Entity Management (entity catalog formats, formal system identifiers, etc.), updating this work to cover XML. "A new OASIS technical committee is being formed. The Entity Resolution TC has been proposed by Lauren Wood, SoftQuad Software Inc.; Norman Walsh, Sun Microsystems; Paul Grosso, Arbortext, Inc.; and John Cowan, Reuters Health. The request for a new TC meets the requirements of the OASIS TC process. . . The objective of the Entity Resolution TC is to provide facilities to address issue A of the OASIS catalog specification (TR 9401). These facilities will take into account new XML features and delete those features of TR 9401 that are only applicable to SGML, as well as those features applicable only to issue B in TR 9401. Deliverables: The Entity Resolution TC will produce a Committee Specification that uses XML syntax and provides a DTD (potentially also an XML Schema) for that syntax. This specification will be ready by August 2001. The Entity Resolution TC intends to submit the Committee Specification as an OASIS Standard after sufficient implementation experience has been gathered." See also the list of active OASIS TCs. On entity resolution, see also the topic "SGML/XML Entity Types, and Entity Management."
  • [April 04, 2000] Arbortext Releases Java Catalog Classes for Resolution of Public Identifiers. Arbortext, Inc., "a leading provider of Extensible Markup Language (XML)-based e-Content software, announced today the immediate availability of open source Java-based code to support public identifier resolution in XML documents. This code will enable XML processors to resolve public identifiers which increases the flexibility and interoperability of XML documents. These Java classes implement the OASIS Entity Management Catalog format [('OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401)')] as well as an XML Catalog format for resolving XML public identifiers into accessible files or resources on a user's system or throughout the Web. These classes can easily be incorporated into most Java-based XML processors, thereby giving the users of these processors all the benefits of public identifier use. As XML processors incorporate this code, users will be able to utilize public identifiers in XML documents with the confidence that they will be able to move those documents from one system to another and around the Web knowing that they will also be able to refer to the appropriate external file or Web page. These classes can easily be plugged into any SAX Parser." Available to everyone at no cost, these Java classes can be immediately downloaded from the ArborText web site. The distribution includes examples of Catalog support with both Xerces and XT. For additional details, see the full text of the ArborText announcement: "Arbortext Makes Java Catalog Classes Available For Use by XML Processors. Open Source Code Enables XML Processors to Resolve Public Identifiers."
  • [April 2000] Norm Walsh on system identifiers: Norm Walsh has provided a tutorial background to the problem of XML system and public identifiers in "If You Can Name It, You Can Claim It!" [Column 'Standard Deviations from Norm', Issue 3, 04-April-2000]. "The fact that XML requires me to supply system identifiers for external references, and the fact that these identifiers are required to be Uniform Resource Identifiers (URIs) is a frequent source of considerable irritation. In this column, we'll explore how you can use OASIS Catalog files (or their XML equivalent) to avoid these difficulties. Using Catalog files became a lot easier earlier this month when Arbortext released its Java Catalog classes to the XML community. Using these classes, it's simple to add Catalog support to your favorite Java parser. (Equivalent support for parsers in other languages should be fairly easy to construct from the free and Open Source of the Java classes, although Arbortext has no immediate plans to undertake this effort.) You can download the classes or view the JavaDoc API Documentation online." Previous URL now broken: [cache, new URL]
  • [November 24, 2000] "The IANA XML Registry." By Michael Mealling (Network Solutions, Inc.). IETF Internet-Draft. Network Working Group. Reference: 'draft-mealling-iana-xmlns-registry-00.txt'. "This document describes an IANA maintained registry for IETF standards which use XML related items such as Namespaces, DTD, Schemas, and RDF Schemas. Over the past few years XML has become a widely used method for data markup. There have already been several IETF Working Groups that have produced standards that define XML DTDs, XML Namespaces and XML Schemas. Each one of these technologies uses URIs to identify their respective versions. For example, a given XML document defines its DTD using the DOCTYPE element. This element, like SGML, has a PUBLIC and a SYSTEM identifier. It is standard practice within W3C standards to forego the use of the PUBLIC identifier in favor of 'well known' SYSTEM identifiers. There have been several IETF standards that have simply created non-existent URIs in order to simply identify but not resolve the SYSTEM identifier for some given XML document. This document seeks to standardize this practice by creating an IANA maintained registry of XML elements so that document authors and implementors have a well maintained and authoritative location for their XML elements. As part of this standard, the IANA will both house the public representation of the document and assign it a Uniform Resource Name that can be used as the URI in any URI based identification components of XML."
  • [December 01, 2000] "XML-Deviant: What's in a Name?" By Leigh Dodds. From (November 29, 2000). ['The XML-Deviant looks at best practices for identifying XML resources; then wonders why more developers aren't taking advantage of entity management systems.'] "Correctly naming resources and objects is widely regarded as one of the most difficult problems in computing (another being caching). As the saying goes, any problem in computing can be solved by adding another level of indirection. One step toward solving naming problems is to add indirection by separating the name of the resource from its address. This is a common pattern, which we see in a number of areas from pointers in C to Persistent URLs (PURLs) on the web. XML 1.0 offers a separation between the naming and addressing of resources or entities referred to in XML documents. Broadly speaking SYSTEM identifiers define an actual resource that is retrieved, or dereferenced to retrieve, the entity in question. A PUBLIC identifier simply gives a name for the required resource. It says nothing about where that resource may be dereferenced. Of course life isn't really that simple, and its likely that some readers are already objecting. The short but heated XML-URI debate earlier this year testifies to the disagreement on this issue. A SYSTEM identifier is specified as a URI, which can be easily be a Uniform Resource Name (URN) as well, instead of being the more commonly found URL. A URN is more like a PUBLIC identifier, as it simply names the resource in question. Yet there is still no widely deployed means of using URNs..." Note also the archives of the OASIS Entity Resolution TC Mailing List.

  • [July 18, 1998] John Cowan ( posted a proposal to the XML-DEV mailing list for XCatalogs - "a system based on SGML/Open catalogs (Socats) for translating public identifiers to system identifiers in XML." According to the proposal, XCatalogs are meant to be "Web resources (anything from local files on up) which contain mappings from public identifiers to system identifiers, plus references to other XCatalogs. They [would] come in two syntaxes: one which is a subset of Socat syntax, and one which is an XML document instance." Note that the xmlproc - Python XML parser tool from Lars Marius Garshol supports the XCatalog. Note (1999-04-06): 'XCatalog' is now called 'XML Catalog'.
  • [September 25, 1998] 'Documentation: xmlproc catalog file support.' General description of the CATALOG file format, in the context of documention for xmlproc. Updated September 11, 1998 or later.
  • [April 05, 2000] From Eric Bohlman (XML-DEV, Wed, 5 Apr 2000): "With all the announcements about catalog processing, I might as well mention that the Perl module XML::Catalog has been available on CPAN for some time now, and I could use some more feedback than I've already gotten (I'm aware there's a problem with embedded backslashes in IDs). It supports John Cowan's catalog syntax (either form) and provides methods for translating public IDs to system IDs and for remapping system IDs." For example:
  • [December 07, 1997] See the publication of the text of PDTR 9573-9 Information Processing -- Text and office systems -- Using SGML Public Identifiers for Specifying Data Notations (ISO/IEC JTC1/WG4 N1958, December 5, 1997), by Martin Bryan and Ken Holman. The technical report "provides a starter set of both notation names and public identifiers which can be used to indicate the coding used for data that conforms to internationally agreed standards published by bodies such as ISO, IEC, ITU and SMPTE. While the notations names are purely advisory, the public identifiers are defined according to the rules for naming ISO standards defined in ISO 8879 and ISO 9070. These forms should be common to all applications that use formal public identifiers." For background information on the decision, see "Recommendations of the Alexandria Meeting" of WG4 (5 December 1997). [local archive copy]
  • Resolving Formal Public Identifiers [local archive copy, display form only]
  • Formal Public Identifier RoadTrip - Spyglass Technical Reference (Murray Altheim). [local archive copy]
  • FPI grammar from ISO/IEC 9070:1991 and ISO 8879:1986. Compiled by Martin Bryan. [Originally from:]
  • Notes on sgmls handling of search for entities (C. M. Sperberg-McQueen)
  • Registration Process for Public Text Owner Identifiers
  • Lynne A Price, "Registering Owners for Public Text Identifiers." In <TAG> 9/5 (May 1996), 7-9. In this article, Lynne explains the registration procedure set up by the GCA. Initial registration fee is $95, changes are $25.
  • Formal Public Identifiers, by David Peterson (March 1994). - "Dave Peterson discusses all the parts of "formal" public identifiers, including the optional parts. Public identifiers are one of two kinds of external identifiers used by SGML systems." [local archive copy]
  • Formal System Indentifiers (FSIs) - W3C document. [local archive copy]
  • J. Tauber's 'delegate' proposal. Implemented in James Clark's SP as an extension to the SGML Open CATALOG TR. [local archive copy]
  • Using SGML Public Identifiers for Specifying Data Notations - Revised text of 9573-9. ISO/IEC JTC1/WG4 N1990. Covering standardized public identifiers for data encodings defined in international standards, including those published by ITU, SMPTE and NBS. "This Technical Report facilitates the interchange of data represented in internationally standardized notations ('data formats') by providing a set of ISO/IEC 9070 public text object identifiers and ISO 8879 notation declarations that different applications can reference to identify the use of such data formats." For example: 'SGML: ISO 8879:1986//NOTATION Standard Generalized Markup Language (SGML)'. For additional information on 9573-9, see ISO/IEC TR 9573:1988. Techniques for Using Standard Generalized Markup Language (SGML). [local archive copy]
  • [August 4, 1998] Eliot Kimber on FPIs and URNs
  • [September 01, 1998] Eliot Kimber (ISOGEN International Corp) has authored a paper The SGML Storage Model which may be of assistance to readers who wish to understand more about the SGML entity as an abstract storage object, and how storage managers interface with an entity manager, a parser, and a processing application. The paper was written in the context of ongoing work toward SGML and STEP harmonization, but its discussion of formal system identifiers and storage models should be of wider interest. ". . .all SGML systems consist, in one way or another, of the same layers. At the bottom are the physical storage managers, the systems that actually manage data on storage media (file systems, database, etc.). Above the storage managers is the entity manager layer. Above the entity manager is the SGML parser and any processing applications. Processing applications talk to the SGML parser to get parsed SGML documents and to the entity manager directly to get data entities. . . [Conclusion:] SGML abstract storage model and entity declaration syntax coupled with the Formal System Identifier facility of ISO/IEC 10744:1997 provides a robust mechanism for representing systems of repositories and storage objects, regardless of their data types." [local archive copy]
  • For more on [public text] "owner identifiers," see ISO/IEC 9070:1991 Registration Procedures for Public Text Owner Identifiers.

ISO 8879: Revisions/Amendments

[CR: 19990118] [Table of Contents]

SGML (ISO 8879:1986) revisions in light of the Standard's first 5-year review are currently in progress. Technical papers and other descriptions relating to the revision are available from several sources:

Technical Resolutions and Proposals

[CR: 19981103]

Papers and Reports:

SGML (and HTML) Stylesheets

[CR: 20000826]

Significant collaborative research and public discussion have been undertaken during late 1994, 1995 and 1996 on "stylesheets" -- in terms of HTML, DSSSL Online (earlier, DSSSL-Lite), and SGML generally. Electronic Book Technologies' products (e.g., DynaText) and SoftQuad's (Explorer, Panorama) already support electronic stylesheets, and DSSSL-Lite will provide another (industry-standard) language for writing them. Jon Stenerson (TCISoft) recently (February 1995) hosted a workshop on stylesheets. As of August, 1996: Microsoft and some other developers were supporting the "cascading style sheets" approach (CSS1) for WWW/HTML documents. A few starter links:


[CR: 19971009] [Table of Contents]

The migration of HTML to well-formed SGML has been something of a rocky road. A few relevant and possibly promising links documenting (the history of) this evolution:


[CR: 19990401]

Much has been written clarifying -- as well as obfuscating -- the relationship of SGML and (La)TeX, and describing the use of SGML and TeX together in document production. The SGML/XML Web Page contains a separate document with a brief explanation of the relationship, as well as pointers to essential information: (a) links to publicly accesible software that makes SGML documents printable via (La)TeX; (b) a substantial a bibliographic reference list germane to "SGML and TeX"; (c) links to other resources on (SGML-)TeX which are more current. Anyone aware of good additions to this bibliographic list is invited to contact me with the relevant information. References are not [yet] complete for publications after 1995.

SGML/XML and Math

[CR: 20000921] [Table of Contents]

Links [provisional only; someone who actually works with math should volunteer to maintain a page of links ;-) ]:

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation


XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI:  —  Legal stuff
Robin Cover, Editor: