|
|
| Home | About | Program | Projects | CIMI Standards | Other Standards | Member Area | CIMI Institute | |
by Richard Light
Brief: A summary of the issues for making the CIMI DTD XML compliant, a discussion of the gains and losses for the current CIMI DTD's full SGML functionality, and a proposal for changes required to make the CIMI DTD XML-compliant
The Issues
1. Inclusion exceptions
The major change required to make the CIMI Full Text DTD XML-conformant is the removal of all inclusion and exclusion exceptions. These define 'floating' elements which can occur at any point in the document's structure. The principal instances from CIMI's point of view are <topic>, <context> and <hot-spot>, but the TEI element types <index>, <milestone>, <pb> and <lb> are also inclusion exceptions.
In general, I suggest that these elements should be placed in the TEI Header, and should point to the elements to which they refer.
While this new approach uses a different technique to express the required concept, it is semantically equivalent to the current approach, so there is no loss of SGML functionality. The only drawback (which is the principal reason I didn't suggest this approach originally) is that users might find it hard to make the links from <context>s and <topic>s in the header to the correct elements within the document. It is the sort of thing which software ought to be able to help with, but there is no SGML editing tool that I am aware of which offers this facility. Separate declarations for each element type
XML does not allow multiple element types to be declared in one go. This is a feature of the original TEI DTD, but the normalized version that CIMI uses already has a separate declaration for each element type. It does not, in any case, affect SGML functionality. No minimization on element declarations
This means that the tag omission indicators (- and O) are not allowed in element declarations, since XML makes of use of this SGML feature.
This XML requirement means that the current DTD is not valid as it stands. However, all that is required is to remove the offending declarations. The result for SGML functionality - which is that start- and end-tag omission is not allowed - is a general XML feature rather than an issue for CIMI's specific DTD.
2. Mixed-content models
Mixed content models (where both data content and markup are allowed - for example within paragraphs) have to take a particular, simple, form in XML. As it happens, all of the low-level element types we introduced in the CIMI DTD already have the approved form. Only those which occur at a higher level (like <pGrp> for a group of paragraphs) share the TEI's "pernicious mixed content" model. This needs to be sorted out by TEI. Simplifying these mixed content models will slightly reduce the expressive power of the DTD, but at the same time it will remove problems caused by this pernicious mixed content that CIMI users have encountered in the past, such as obscure parsing errors.
3. The '&' connector
This connector (which means that all the elements mentioned must occur, but in any order) is not supported in XML. However, it is only used within three TEI element types in the CIMI DTD (<publicationStmt>, <cit> and <respStmt>). There will be no major loss of SGML functionality from changing these content models. #CURRENT attribute values
The #CURRENT attribute value is not supported in XML. This is used within TEI for all of the <divN> element types. #CURRENT is like a #REQUIRED attribute value, except that so long as you put one in for the first occurrence of an element type, that value is copied forward to all subsequent instances of the element type. TEI will obviously have to decide whether to make this optional or mandatory for XML. If it becomes mandatory, the functionality will be the same as for SGML, but markers-up will have more work to do!
4. Comments within declarations
It will no longer be possible to put 'inline' comments (surrounded by -- ... --) inside <!ELEMENT and <!ATTLIST declarations. This will require a small change to the program that produces the normalized CIMI DTD. It does not affect SGML functionality.
5. Miscellaneous changes
There are some more, minor changes that will affect the design of the TEI DTD, and thus the CIMI Full Text DTD. However, none of them affect SGML functionality, and none will require any action on our part.
6. Other changes implied by a switch to XML
Propose recommendations for incorporating XML into the CIMI DTD eg: should the XML CIMI DTD be separate from the full DTD or can it be incorporated into the full DTD or other options;
Implementation options
I would advise against this option. It is far from clear to me that moving totally to XML would confer any practical benefits on users of the CIMI DTD. The changes that are required for basic XML conformance will entail a significant reworking of every document entered so far. For example, every <topic> and <context> element will need to be re-sited in the TEI Header.
Also, the need to include SYSTEM identifiers removes much of the value for CIMI of the SGML approach, which aims to keep documents as future-proof as possible by using only PUBLIC identifiers for external entities. So, in this sense, users would actually be worse off with XML.
Identify those who have created XML-compliant DTDs relevant to the CIMI DTD and briefly describe how they did it; provide pointers to relevant sites for detailed information
Conversion of other relevant DTDs
I am not aware of any significant XML-compliant DTDs that are relevant to CIMI. Jon Bosak of Sun has done a conversion of the DocBook DTD to XML for an experimental version of Sun's AnswerBook project. To see this service (which generates HTML on the fly from the XML) operating in normal mode, point your Web browser at http://docs.sun.com. To see TOCs, or actual chunks of documentation in XML format, use one of the following:
As part of the research for my book "Presenting XML" (published this week by Sams.net), I went through the process of converting the HTML 2.0 DTD to an XML-compliant form. See Chapter 12 for details I have used that work as a checklist when listing the changes required for an XML version of the CIMI DTD.
TEI attitude to XML
I have discussed the issue of an XML version of the TEI framework with Lou Burnard, and he said that:
___________
Richard Light
26 September 1997
Navigation
| Credits
| E-mail