[Unofficial mirror copy from: http://www.textuality.com/mgml/paper.html, November 5, 1996 ]
by Tim Bray, Textuality.
The Standard Generalized Markup Language is the most fully developed specification of the use of descriptive markup languages for electronic documents. The idea of descriptive markup is simple and powerful, and in fact has proved to be a basic requirement for many advanced information processing applications.
Unfortunately, the adoption of SGML has proved surprisingly difficult, expensive and slow, given that the underlying ideas are simple and self-evidently good. Some of the perceived reasons have included:
Nonetheless, there remains a consensus that SGML's basic design partition into entities, elements, and attributes is correct and useful. One result is a common tendency, in strategic projects involving SGML, to avoid using many advanced features and operate within the bounds of a highly restricted subset. This approach has generally met with success. However, this restricted subset has been re-invented by each successive group that has attacked the problem.
It is our opinion that SGML exhibits an extreme case of the "80-20 syndrome"; that is to say, 80% of the benefit is gained by applying only 20% of the machinery. It is the goal of this project to formalize the definition of this useful subset, which we call Minimal Generalized Markup Language, MGML.
The design goals are that MGML shall:
The syntactic structure of MGML, enabling markup to be destinguished from data, is hardwired and presented formally as a set of lex-style regular expressions here.
MGML is based on the Document Structure Definition (DSD). A DSD is a set of structure definitions that apply to all documents of a given class. The required content and structure of a DSD are defined by the MGML Reference DSD. The behavior of a conforming MGML processor is defined in the list appearing below in this document, and in commentary text attached to the structure definitions in the MGML Reference DSD. These behavior specifications and the MGML Reference DTD together constitute the sole and complete definition of MGML.
The MGML Reference DSD defines a total of 21 elements and 18 attributes. In printed form, it occupies only 5 pages. An electronic form may be obtained here. To help in understanding, a real SGML DTD for the MGML Reerence DSD may be obtained here.
A reference parser for a slightly earlier version, including fairly complete entity processing, implemented as two lex modules, one C module, and one yacc module, comprised about 1000 lines of code.
A conforming MGML processor shall: