SGML-MIME: Strawman Proposal
From owner-majordomo@ebt.com Thu Mar 2 01:41:41 1995
Return-Path: <owner-majordomo@ebt.com>
Received: from ebt-inc.ebt.com by utafll.uta.edu (4.1/25-eef)
id AA05756; Thu, 2 Mar 95 01:41:31 CST
Received: from Princeton.EDU (root@Princeton.EDU [128.112.128.1]) by ebt-inc.ebt.com (8.6.9/8.6.9) with SMTP id UAA25317 for <sgml-internet@ebt.com>; Wed, 1 Mar 1995 20:52:57 -0500
Received: from acupain.UUCP by Princeton.EDU (5.65b/2.115/princeton)
id AA19742; Wed, 1 Mar 95 18:41:29 -0500
Received: by Accurate.COM (4.1/SMI-4.0)
id AA20967; Wed, 1 Mar 95 17:36:24 EST
Message-Id: <9503012236.AA20967@Accurate.COM>
To: sgml-internet@ebt.com
Subject: Re: Strawman Proposal
Organization: Accurate Information Systems, Inc.
X-Org-Addr: 2 Industrial Way
X-Org-Addr: Eatontown, NJ 08840
X-Org-Misc: 1.908.389.5550 (phone) 1.908.389.5556 (fax)
X-Mailer: MH 6.8
Date: Wed, 01 Mar 1995 17:36:23 -0500
From: Ed Levinson <elevinso@Accurate.COM>
Status: RO
<embarrased>Sorry, I forgot to include the proposal.../Ed
Network Working Group E. Levinson
Internet Draft: MIME/SGML Editor
<draft-ietf-mimesgml-03.txt> March 1, 1995
Encapsulating SGML Documents Using
the Multipart/Related Content-Type
THIS IS A ROUGH DRAFT FOR DISCUSSION PURPOSES. IT IS LITTLE
MORE THAN AN OUTLINE AND SOME EXAMPLES. IT REPRESENTS MY
UNDERSTANDING OF WHAT HAS BEEN DISCUSSED AND CONTAINS SOME
hopefully minor CHANGES OF MY OWN. Ed
This draft document is being circulated for comment. Please
send your comments to the authors or to the sgml-internet
mail list <sgml-internet@ebt.com>.
Archives of the email discussions are available at
ftp://ftp.naggum.no:/pub/sgml-internet filed by date and
time.
Abstract
This draft describes the encapsulation of an SGML document
withing a MIME message. It makes use of the
Multipart/Related proposed MIME Content-Type, proposes new
content sub-types of Text/sgml and Application/sgml, and two
new headers, Content-SGML-Entity and Content-SGML-Notation.
1 Introduction
A need exists for the transfer of documents constructed
using the Standard Generalized Markup Language (SGML) [ISO-
SGML]. Those documents consist of a set of inter-related
files whose structure of relationship must be preserved
independently of the system on which the document exists.
The Multipart/Relate content-type [IDrel] proposed for the
Multipurpose Internet Mail Extensions (MIME) specification
[RFC-MIME] provides a mechanism for preserving such
relationships. This memorandum applies those mechanisms to
SGML documents.
1.1 SGML
SGML is used in several communities to encode document
structure and layout. A rigorous description of SGML is
left to [ISO-SGML]. Appendix A of this document, which is
unbelievably brief, contains a description of the SGML
elements relevant to MIME encapsulation. The terms used in
the present document attempt to be consistent with SGML
terminology and usage.
The SGML document exists as a collection of one or more
files and an SGML document refers to these files via entity
Levinson (Editor) Expires 6 months from issue date [Page 1]
Internet Draft MIME-SGML
declarations. They define the name and type of the storage
object or file. Preservation of the structure of references
from one entity to another are key to the email exchange of
SGML documents. In SGML that structure is known as the
entity structure.
For a person or application to receive and display a
complete SGML document the mail message must carry a precise
definition for each of the SGML document parts. In the
sender's environment the document parts may reference
standard definitions or specific local files. Further, a
DTD may reference other files, for example images and
graphics. The identity of the document parts and the
content of each file must be available to enable the
recipient to transform the sender's file name references
into an equivalent local reference and to instantiate the
files locally.
1.2 SGML Document Interchange Format (SDIF)
1.3 Organization of this Memorandum
2 A Model for MIME/SGML
Four issues must be addressed for the recipient's user agent
to display a complete SGML document: the various parts must
be specified and file references on the sender's systems
must be resolved to references on the receiver's system,
similarly, command references must be resolved. Finally, an
appropriate application, an unpacker, must be in control to
unpack of the MIME body parts and present them to the
display software. The controlling application is discussed
first and then the document parts, file references, and
command references.
2.1 Invoking the SGML System
3 Encapsulting SGML
3.1 Multipart/Related Content-Type
The Multipart/Related content-type contains a set of related
body parts and lists one or more body parts with which
processing starts. The body parts are constituents of the
multipart and each has an associated content-ID. The
content-IDs are used to reference the starting parts. Below
is a simple example of a Multipart/Related that contains.
3.2 An Example
MIME-Version: 1.0
Content-Type: Multipart/Related; boundary=tiger-lily
start="<doc.950209.1430@Acme.com>"; type="application/sgml"
Levinson (Editor) Expires 6 months from issue date [Page 2]
Internet Draft MIME-SGML
--tiger-lily
Content-Type: Application/sgml; catalog="<cat.950209.1430@Acme.com>"
Content-SGML-Entity: doctype;
public-id="-//Acme//DTD Book//EN";
system-id="/home/users/sgml/dtds/book.dtd"
Content-ID: <doc.950209.1430@Acme.com>
<!DOCTYPE book PUBLIC
"-//Acme//DTD Book//EN"
"/home/users/sgml/dtds/book.dtd"
[
<!ENTITY chap1 PUBLIC "-//Acme//TEXT chapt1//EN">
<!ENTITY chap2 SYSTEM>
<!ENTITY chap3 SYSTEM "chapt3.sgm">
<!NOTATION jxz SYSTEM "/usr/local/bin/jxz">
<!ENTITY fig1 SYSTEM "fig1.jxz" NDATA jxz>
]>
<book> &chap1; &chap2; &chap3; </book>
--tiger-lily
Content-Type: Text/sgml
Content-SGML-Entity: general; name=chap1; doctype=book;
public-id="-//Acme//TEXT chapt1//EN"
<chapt><H1>This is chapter ONE ...</chapt>
--tiger-lily
Content-Type: Text/sgml;
Content-SGML-Entity: general; name=chap2; doctype=book;
<chapt><H1>This is chapter TWO ...</chapt>
--tiger-lily
Content-Type: Text/sgml
Content-SGML-Entity: general; name=chap3; doctype=book;
system-id="chapt3.sgm"
<chapt><H1>This is chapter THREE ...</chapt>
--tiger-lily
Content-Type: Application/sgml
Content-SGML-Entity: general; doctype=book;
public-id="-//Acme//DTD Book//EN";
system-id="/home/users/sgml/dtds/book.dtd"
<-- Acme Widget Company -->
<-- Instruction Book DTD -->
<!ELEMENT ...>
--tiger-lily
Content-Type: image/jpeg
Content-Transfer-Encoding: BASE64
Content-SGML-Entity: general; name=fig1; doctype=book;
system-id="fig1.jxz"; notation-name=jxz
Content-SGML-Notation; name=jxz; doctype=book;
system-id="/usr/local/bin/jxz"
[Base64 encoded binary image data]
Levinson (Editor) Expires 6 months from issue date [Page 3]
Internet Draft MIME-SGML
--tiger-lily--
3.3 Specifying the Document Parts
4. The Content-SGML-* Headers
sgml-header := entity-header / notation-header
4.1 Common syntax
SGML headers may contain the following parameter values.
Note that "notation-name" is not valid for a notation-
header.
parameter := attribute "=" value
attribute := sgml-token
value := token / quoted-string ; c.f. RFC 1521
sgml-token := "name" / "doctype" / "linktype" /
"public-id" / "system-id" / "notation-name" /
extension-token
extension-token := ( "X-" / "x-" ) token
; no intervening white space
name A string giving the name of the entity. This would be omit-
ted when sending a collection of entities (such as a set of
DTDs) that's not part of a document. A receiving system
still needs some of the information in the Content-SGML-
Entity: header, even if the entity isn't part of a document.
For example, it might need to make an entry in a catalog
mapping the entity's public id to the filename. It would
also omitted for unnamed entities.
doctype A string specifying the document type name of the DTD subset
in which the entity was declared, if the entity was declared
in a DTD subset other than the base DTD subset. This param-
eter would not be present for entities with a decl-type
other than "general" or "parameter". It would be possible
to handle multiple DTDs (and LPDs) by nested
multipart/related parts, it's more convenient to use one
multipart/related part for each SGML parsing context, and
for the Content-SGML-Entity: parameters to uniquely identify
the entity within that parsing context.
linktype A string specifying the link type name of the LPD subset in
which the entity was declared, if the entity was declared in
an LPD subset. This parameter would not be present for
entities with a decl-type other than "general" or "parame-
ter".
Levinson (Editor) Expires 6 months from issue date [Page 4]
Internet Draft MIME-SGML
public-id The public identifier in the entity's declaration; for an
entity not in a document, the public identifier of the
entity.
notation-nameThe notation name of an external entity. Not valid in a
notation-header. The value of this parameter will be the
value of the name parameter of a Content-SGML-Notation
header.
extension-token
A parameter not defined in this document and agreed upon by
the parties using it, a group of consenting adults.
4.2 The Content-SGML-Entity Header
entity-header := "Content-SGML-Entity" ":" decl-type
*( ";" parameter )
decl-type := "doctype" / "linktype" / "general" /
"parameter"
Decl-type is a token specifying how the entity was declared:
general An entity declared in a entity declaration as a general entity
parameter An entity declared in an entity declaration as parameter
entity.
doctype An entity containing an external DTD subset, declared by a
doctype declaration; the name in this case would be the docu-
ment type name.
linktype An entity containing an external LPD subset, declared in a
linktype declaration; the name in this case would be the link
type name.
4.3 The Content-SGML-Notation Header
A notation-header will appear in each NDATA external entity and it pro-
vides the information from the corresponding notation declaration.
These headers may be repeated as several entity declarations may have
the same notation name.
notation-header := "Content-SGML-Notation" ":"
*( ";" parameter )
5 Partial or Incomplete Documents
6 SDIF?
7 References
[ISO-SGML] ISO 8879:1988, Information processing -- Text and office
Levinson (Editor) Expires 6 months from issue date [Page 5]
Internet Draft MIME-SGML
systems -- Standard Generalized Markup Language (SGML).
[ISO-SDIF] ISO 9069:1988, Information Processing - SGML Support Facili-
ties -- SGML Document Interchange Format (SDIF).
[RFC-822] Crocker, D., Standard for the Format of ARPA Internet Text
Messages, August 1982, University of Delaware, RFC 822.
[RFC-HDRC] Moore, Keith, Representation of Non-Ascii Text in Internet
Message Headers, June, 1992, RFC 1342
[RFC-MIME] Borenstein, N. and Freed, N., MIME (Mulitpurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing
the Format of Internet Message Bodies, June 1992, RFC 1341.
[US-ASCII] Coded Character Set -- 7-Bit American Standard Code for
Information Interchange, ANSI X3.4-1986.
8 Acknowledgements
The editor has borrowed freely from the suggestions of others and in
particular lifted text from James Clark and ideas from Roy Fielding.
The editor also acknowledges Terry Allen, O'Reilly & Associates, Harald
T. Alvestrand, UniNett, Nathaniel Borenstein, First Virtual Holdings
Incorporated, Daniel W. Connolly, W3O, Steven DeRose, EBT, Andy Gelsey,
CSC, Paul Grosso, ArborText, John Klensin, MCI, Einar Stefferud, Network
Management Associates, Inc, and Erik Naggum, for their suggestions,
explanations, and encouragement. No errors or faults in this document
can be ascribed to them, those are mine.
UNIX is a registered trademark of UNIX System Laboratories, Inc.
9 Author's Address
Ed Levinson elevinson@accurate.com Accurate Information Systems, Inc. 2
Industrial Way Eatontown, NJ 0772
Levinson (Editor) Expires 6 months from issue date [Page 6]
Internet Draft MIME-SGML
Appendix A. SGML for IETFers
This is a description of the elements of the Standard Generalized Markup
Language (SGML) that are key to understanding the relationship between
SGML and the Multipurpose Internet Mail Extensions (MIME). For the pur-
poses of this discussion, and without doing too much damage to the SGML
specification, an SGML document contains text, markup, and references to
non-text document elements (graphics). For a complete and accurate
description see ISO 8879, Information Processing - Text and office sys-
tems - Standard Generalized Markup Language (SGML).
An SGML document has the following structure (the parenthesized numbers
refer to productions in ISO 8879) and is processed by an application
called an SGML parser. Note that Internet style ABNF is used for nota-
tion here, SGML uses a different style.
sgml-doc := sgml-decl prolog doc-inst (2)
sgml-sub-doc := dtd doc-inst (3)
Sgml-decl defines the various elements and parameters of SGML. For
example, the characters that introduce and end markup tags, "<" and ">"
respectively will be used here, the maximum length of markup tags, etc..
The prolog defines the document structure, usually through an SGML con-
struct called the document type definition (DTD). Most importantly for
interchange considerations, the DTD contains references to external
files, system commands, and text to be sent directly to a typesetter or
printer.
Doc-inst is the actual document instance or text; it also includes
graphic elements, other text with or without markup, by reference to DTD
elements.
The remainder of this discussion focuses on two elements which a DTD
references, entities and notations. They appear in the DTD and have the
following format.
entity := "<!" "ENTITY" name e-text ">" (101)
e-text := q-string | data | b-text | external (105)
data := ( "CDATA" | "SDATA" | "PI" ) q-string (106)
external := ext-id
[ ( "SUBDOC" | ( "NDATA" type ) ) ] (108)
ext-id := ( "SYSTEM" q-string)
| ( "PUBLIC" pub-id [q-string] ) (73)
notation := "<!" "NOTATION" type ext-id ">" (148)
where name is a character string and the definition of b-text left to
ISO 8879; for convenience q-string has been substituted for the SGML
term parameter literal. Entities referred to via the SUBDOC keyword
differ from SGML documents in that they only contain a DTD and a doc-
inst.
Using the above productions the following simple example entities demon-
strate the important issues. Name and type are alphanumeric tokens and
Levinson (Editor) Expires 6 months from issue date [Page 7]
Internet Draft MIME-SGML
q-string is a series of characters enclosed in double (or single) quote
marks.
<!ENTITY name PUBLIC pname> (A)
<!ENTITY name SYSTEM fname> (B)
<!ENTITY name SYSTEM fname NDATA type> (C)
<!NOTATION type SYSTEM command> (D)
<!ENTITY name PI q-string> (E)
Form A refers to a well known or "public" name that the SGML parser is
able to resolve; in the marked up text there will be a markup item
<name> that directs the parser to include the corresponding public file.
Similarly, form B corresponds to a locally known file. Form C allows
the markup text to refer to non-SGML data, an image for example, and the
type parameter must match the type of a NOTATION element . The matching
element's command parameter specifies the command which processes the
file fname. Finally form E, processing instructions, specifies a string
of characters to be sent directly to the output device.
These examples give rise to the following issues when the document is
transferred from one environment to another.
A Is the public name known to the recipient? The recipient SGML
parser may not know of the public file and this will be discovered
when it processes the document.
B What is the file name on the recipient system? There must be some
process which binds the sender's file names to the recipient.
C See B and D.
D Direct use of the NOTATION form is a large security risk, an invi-
tation to a Trojan Horse attack. The recipient must be protected
from a sender invoking an arbitrary command on the recipient sys-
tem.
E Processing instructions permit the sender to manipulate the reci-
pient output device. This is the same risk that exists for
PostScript documents and is not addressed.
Issues A through D are addressed in this document.
Levinson (Editor) Expires 6 months from issue date [Page 8]
Internet Draft MIME-SGML
Appendix B. Content-Type registrations
_________________________________
The Application/SGML Content-Type
(1) MIME type name: Application
(2) MIME subtype name: SGML
(3) Required parameters: none
(4) Optional parameters: SGML-version, created-with, charset
(5) Encoding considerations: may be encoded
(6) Security considerations: none
(7) Specification:
This subtype is used for text marked with the Standard Generalized
Markup Language [ISO SGML].
_________________________________
The Text/SGML Content-Type
(1) MIME type name: Text
(2) MIME subtype name: SGML
(3) Required parameters: none
(4) Optional parameters: SGML-version, created-with, charset
(5) Encoding considerations: may be encoded
(6) Security considerations: none
(7) Specification:
Levinson (Editor) Expires 6 months from issue date [Page 9]
Internet Draft MIME-SGML
This subtype is used for text marked with the Standard Generalized
Markup Language [ISO SGML].
_________________________________
Levinson (Editor) Expires 6 months from issue date [Page 10]