[Mirrored from: http://www.eccnet.com/sgmlug/johnrice.html]
An overview of process and technique
Presented
February 15, 1996
"The fundamental concept of design is to create a two- or three-dimensional world with conscious efforts of organization of various elements... to establish visual harmony and order."
- Wucius Wong (paraphrased)
The fundamental precept for an SGML application is to define a set of rules to describe a continuum of data in a harmonious and carefully governed manner.
A data collection may be seen as a continuum. That is, it grows, changes, evolves, yet remains united by some common thread. Typically, the document is described as the fundamental unit in that continuum.
In the past, the document was perceived as a concrete, physical object - a page or collection of pages containing some amount of information.
Information within the document was arranged to visually convey structure and order, demonstrate relationships between data, and communicate semantics (do NOT press button 13). In this way, document structure was implied through appearance.
In contrast, SGML defines documents explicitly in terms of their structure - not what data says or looks like, but what it represents. The document is broken-down into individual data units, whose relationships are carefully defined and governed. SGML recognizes the generics ¾ types of "things", their order, structure, and context.
Thus, the concept of "document" becomes somewhat of an abstraction, and document might then be defined as a specifically delimited collection of interrelated data units. Data units which for purposes of human-consumption might be arranged, organized, combined, and displayed in different ways according to different requirements.
This perceptual shift is fundamental to the success of any SGML application development.
A document is a subset of a larger collection of information
A document is itself composed of distinct units of information
"document" is just a word to express a boundary
In its most basic form, an SGML application is composed of the following:
The components of your processing system (context-sensitive editors, databases, text formatters, etc.) are the tools with which you manipulate and process your data.
SGML is designed to be used in a modular fashion. Just like leggos, you're expected to attach other things to it.
Functionality is something designed into the DTD. Attributes, entities, notations ¾ these structures can act as attachment points.
An important part of DTD design is anticipate the needs of your processing and delivery systems and to incorporate the necessary "tabs and slots" into your DTD.
At the core of every SGML application lies the DTD
You don't attach an ID attribute to an element just because it
seems like the thing to do.
Traditionally, document analysis is performed by an SGML Expert with relatively limited user input. The Expert gathers samples, talks to authors and editors then goes away and draws conclusions based on this information. The resulting DTD is then a result of the interpretations that have been made.
SGML Experts know SGML
Your people know your data
An SGML implementation is likely to represent a significant cultural disruption. Furthermore, your authors will be forced to look at their data in new (and initially uncomfortable) ways. It will introduce new and frightening tools and practices which alter the status quo.
This is a relatively common dynamic:
NEW = DIFFERENT = BAD
Ignoring potential conflicts can only make your job more difficult. To ignore your staff as a resource is foolish.
Drawing your staff into the process of document analysis can drastically improve the functionality of your DTD. No less important, it is way to empower them -- to invest them with the feeling of involvement.
More modern document analysis techniques bring the SGML Expert together with authors, editors, and other key people who work with the data. By key, I mean the people who are working with your data on a daily basis. This type of analysis is performed as a facilitated discussion led by the SGML Expert. It is a process of consensus building. It is also a process of discovery.
Unfortunately, many people find that their data is not nearly as organized as they would like to believe.
It is best to drag out such skeletons in the early stages. It is certainly better than finding out somewhere down the road.
ISO 8879 is quite strict about the proper use of SGML syntax. However, it is somewhat less concerned with how that syntax appears in the DTD. Data format and organization is not a concern if you are a parser. However, DTDs are written, used, and maintained by people.
All irony aside, DTD formatting pays.
Include a header with your DTD. Header information should include things like the authors name, the creation date, specifics about the document type described by the DTD, etc. Don't forget to include a change history detailing modifications made, by whom, and when."
Declare all parameter entities together at the top
Elements should be declared in the order in which they appear in the data. Define the contents of the larger structures as they are called in the content models.
Group low-level structures like emphasis and keyword together at the end of the DTD.
Using formatting strategies such as aligning declaration starts and ends, employing negative space (white space), and following a design strategy not only makes your work more readable, it makes it easier to maintain.
Remember, a DTD will be used by many people. Presumably, it will be updated or modified at some point. Liberal comments add another degree of functionality to your work.
<!-- First level list -->
<!-- V1.11, added optional title to all levels of lists -->
<!-- V1.11, added optional symbol before each item -->
<!-- V1.11, added figure, graphic to list1 through list5 content -->
<!ELEMENT list1 - - (title?, (figure | figureref | graphic)*,
symbol*, item, ((symbol*, item) | list2)*) >
<!-- Second level list -->
<!ELEMENT list2 - - (title?, (figure | figureref | graphic)*,
symbol*, item, ((symbol*, item) | list3)*) >
<!-- Third level list -->
<!ELEMENT list3 - - (title?, (figure | figureref | graphic)*,
symbol*, item, ((symbol*, item) | list4)*) >
<!-- Fourth level list -->
<!ELEMENT list4 - - (title?, (figure | figureref | graphic)*,
symbol*, item, ((symbol*, item) | list5)*) >
<!-- Fifth level list -->
<!ELEMENT list5 - - (title?, (figure | figureref | graphic)*,
symbol*, item)+ >
<!-- V1.11, make type and enumtype attributes optional on list -->
<!ATTLIST (list1 |
list2 |
list3 |
list4 |
list5) type (ordered | unordered) #IMPLIED
%enum; >
<!-- ==== Graphic (the actual image file) ======================= -->
<!-- ============================================================= -->
<!ELEMENT graphic - o EMPTY >
<!ATTLIST graphic
color %yesorno; "0"
height NUMBER #REQUIRED
width NUMBER #REQUIRED
type (BMP|CGM|EPS|GIF|
MIF|PCX|TIFF|WMF) #IMPLIED
graphicnum CDATA #REQUIRED>
<!-- color................Is the graphic in color?
The default value is "0" (no)
height...............Height of the graphic, in points
width................Width of the graphic, in points
the graphics within the figure; are
they placed horizontally or
vertically?
type.................What type of graphic is this?
graphicnum...........Unique identifier used to refer to
the graphic as a part of the
database -->
<!-- ============================================================= -->
<!-- ===================== Text Level ========================= -->
<!-- ============================================================= -->
<!-- ==== Caution =============================================== -->
<!-- ============================================================= -->
<!ELEMENT caution - - (%text.level;)+ >
<!-- ==== Reference to a Chapter Number ========================= -->
<!-- ============================================================= -->
<!ELEMENT chapter.num - - (#PCDATA) >
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Developer's Note:
~~~~~~~~~~~~~~~~~
The chapter number should be
displayed with surrounding angle
brackets. For instance "<26>"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->