SGML-MIME: Strawman Proposal

SGML-MIME: Strawman Proposal

From owner-majordomo@ebt.com Thu Mar  2 01:41:41 1995
Return-Path: <owner-majordomo@ebt.com>
Received: from ebt-inc.ebt.com by utafll.uta.edu (4.1/25-eef)
	id AA05756; Thu, 2 Mar 95 01:41:31 CST
Received: from Princeton.EDU (root@Princeton.EDU [128.112.128.1]) by ebt-inc.ebt.com (8.6.9/8.6.9) with SMTP id UAA25317 for <sgml-internet@ebt.com>; Wed, 1 Mar 1995 20:52:57 -0500
Received: from acupain.UUCP by Princeton.EDU (5.65b/2.115/princeton)
	id AA19742; Wed, 1 Mar 95 18:41:29 -0500
Received: by Accurate.COM (4.1/SMI-4.0)
	id AA20967; Wed, 1 Mar 95 17:36:24 EST
Message-Id: <9503012236.AA20967@Accurate.COM>
To: sgml-internet@ebt.com
Subject: Re: Strawman Proposal 
Organization: Accurate Information Systems, Inc.
X-Org-Addr: 2 Industrial Way
X-Org-Addr: Eatontown, NJ  08840
X-Org-Misc: 1.908.389.5550 (phone) 1.908.389.5556 (fax)
X-Mailer: MH 6.8
Date: Wed, 01 Mar 1995 17:36:23 -0500
From: Ed Levinson <elevinso@Accurate.COM>
Status: RO

<embarrased>Sorry, I forgot to include the proposal.../Ed


       Network Working Group                            E. Levinson
       Internet Draft: MIME/SGML                             Editor
       <draft-ietf-mimesgml-03.txt>                   March 1, 1995

                    Encapsulating SGML Documents Using
                    the Multipart/Related Content-Type

       THIS IS A ROUGH DRAFT FOR DISCUSSION PURPOSES.  IT IS LITTLE
       MORE THAN AN OUTLINE AND SOME EXAMPLES.  IT REPRESENTS MY
       UNDERSTANDING OF WHAT HAS BEEN DISCUSSED AND CONTAINS SOME
       hopefully minor CHANGES OF MY OWN. Ed

       This draft document is being circulated for comment.  Please
       send your comments to the authors or to the sgml-internet
       mail list <sgml-internet@ebt.com>.

       Archives of the email discussions are available at
       ftp://ftp.naggum.no:/pub/sgml-internet filed by date and
       time.

       Abstract

       This draft describes the encapsulation of an SGML document
       withing a MIME message.  It makes use of the
       Multipart/Related proposed MIME Content-Type, proposes new
       content sub-types of Text/sgml and Application/sgml, and two
       new headers, Content-SGML-Entity and Content-SGML-Notation.

       1       Introduction

       A need exists for the transfer of documents constructed
       using the Standard Generalized Markup Language (SGML) [ISO-
       SGML].  Those documents consist of a set of inter-related
       files whose structure of relationship must be preserved
       independently of the system on which the document exists.

       The Multipart/Relate content-type [IDrel] proposed for the
       Multipurpose Internet Mail Extensions (MIME) specification
       [RFC-MIME] provides a mechanism for preserving such
       relationships.  This memorandum applies those mechanisms to
       SGML documents.

       1.1      SGML

       SGML is used in several communities to encode document
       structure and layout.  A rigorous description of SGML is
       left to [ISO-SGML].  Appendix A of this document, which is
       unbelievably brief, contains a description of the SGML
       elements relevant to MIME encapsulation.  The terms used in
       the present document attempt to be consistent with SGML
       terminology and usage.

       The SGML document exists as a collection of one or more
       files and an SGML document refers to these files via entity



       Levinson (Editor)   Expires 6 months from issue date            [Page 1]

       Internet Draft                                                 MIME-SGML


       declarations.  They define the name and type of the storage
       object or file.  Preservation of the structure of references
       from one entity to another are key to the email exchange of
       SGML documents.  In SGML that structure is known as the
       entity structure.

       For a person or application to receive and display a
       complete SGML document the mail message must carry a precise
       definition for each of the SGML document parts.  In the
       sender's environment the document parts may reference
       standard definitions or specific local files.  Further, a
       DTD may reference other files, for example images and
       graphics.  The identity of the document parts and the
       content of each file must be available to enable the
       recipient to transform the sender's file name references
       into an equivalent local reference and to instantiate the
       files locally.

       1.2     SGML Document Interchange Format (SDIF)

       1.3     Organization of this Memorandum

       2       A Model for MIME/SGML

       Four issues must be addressed for the recipient's user agent
       to display a complete SGML document: the various parts must
       be specified and file references on the sender's systems
       must be resolved to references on the receiver's system,
       similarly, command references must be resolved. Finally, an
       appropriate application, an unpacker, must be in control to
       unpack of the MIME body parts and present them to the
       display software.  The controlling application is discussed
       first and then the document parts, file references, and
       command references.

       2.1     Invoking the  SGML System

       3       Encapsulting SGML

       3.1     Multipart/Related Content-Type

       The Multipart/Related content-type contains a set of related
       body parts and lists one or more body parts with which
       processing starts.  The body parts are constituents of the
       multipart and each has an associated content-ID.  The
       content-IDs are used to reference the starting parts.  Below
       is a simple example of a Multipart/Related that contains.

       3.2     An Example

            MIME-Version: 1.0
            Content-Type: Multipart/Related; boundary=tiger-lily
               start="<doc.950209.1430@Acme.com>"; type="application/sgml"




       Levinson (Editor)   Expires 6 months from issue date            [Page 2]

       Internet Draft                                                 MIME-SGML


            --tiger-lily
            Content-Type: Application/sgml; catalog="<cat.950209.1430@Acme.com>"
            Content-SGML-Entity: doctype;
               public-id="-//Acme//DTD Book//EN";
               system-id="/home/users/sgml/dtds/book.dtd"
            Content-ID: <doc.950209.1430@Acme.com>

            <!DOCTYPE book PUBLIC
                 "-//Acme//DTD Book//EN"
            "/home/users/sgml/dtds/book.dtd"
            [
            <!ENTITY chap1 PUBLIC "-//Acme//TEXT chapt1//EN">
            <!ENTITY chap2 SYSTEM>
            <!ENTITY chap3 SYSTEM "chapt3.sgm">
                 <!NOTATION jxz SYSTEM "/usr/local/bin/jxz">
                 <!ENTITY fig1  SYSTEM "fig1.jxz" NDATA jxz>
            ]>
            <book> &chap1; &chap2; &chap3; </book>
            --tiger-lily
            Content-Type: Text/sgml
            Content-SGML-Entity: general; name=chap1; doctype=book;
               public-id="-//Acme//TEXT chapt1//EN"

            <chapt><H1>This is chapter ONE ...</chapt>
            --tiger-lily
            Content-Type: Text/sgml;
            Content-SGML-Entity: general; name=chap2; doctype=book;

            <chapt><H1>This is chapter TWO ...</chapt>
            --tiger-lily
            Content-Type: Text/sgml
            Content-SGML-Entity: general; name=chap3; doctype=book;
               system-id="chapt3.sgm"

            <chapt><H1>This is chapter THREE ...</chapt>
            --tiger-lily
            Content-Type: Application/sgml
            Content-SGML-Entity: general; doctype=book;
               public-id="-//Acme//DTD Book//EN";
               system-id="/home/users/sgml/dtds/book.dtd"

            <--  Acme Widget Company  -->
            <-- Instruction Book DTD -->
            <!ELEMENT ...>

            --tiger-lily
            Content-Type: image/jpeg
            Content-Transfer-Encoding: BASE64
            Content-SGML-Entity: general; name=fig1; doctype=book;
               system-id="fig1.jxz"; notation-name=jxz
            Content-SGML-Notation; name=jxz; doctype=book;
               system-id="/usr/local/bin/jxz"

            [Base64 encoded binary image data]



       Levinson (Editor)   Expires 6 months from issue date            [Page 3]

       Internet Draft                                                 MIME-SGML


            --tiger-lily--

       3.3     Specifying the Document Parts

       4.      The Content-SGML-* Headers

            sgml-header  := entity-header / notation-header

       4.1     Common syntax

       SGML headers may contain the following parameter values.
       Note that "notation-name" is not valid for a notation-
       header.

            parameter    := attribute "=" value

            attribute    := sgml-token

            value        := token / quoted-string   ; c.f. RFC 1521

            sgml-token   := "name" / "doctype" / "linktype" /
                            "public-id" / "system-id" / "notation-name" /
                            extension-token

            extension-token := ( "X-" / "x-" ) token
                            ; no intervening white space


       name        A string giving the name of the entity.  This would be omit-
                   ted when sending a collection of entities (such as a set of
                   DTDs) that's not part of a document.  A receiving system
                   still needs some of the information in the Content-SGML-
                   Entity: header, even if the entity isn't part of a document.
                   For example, it might need to make an entry in a catalog
                   mapping the entity's public id to the filename.  It would
                   also omitted for unnamed entities.

       doctype     A string specifying the document type name of the DTD subset
                   in which the entity was declared, if the entity was declared
                   in a DTD subset other than the base DTD subset.  This param-
                   eter would not be present for entities with a decl-type
                   other than "general" or "parameter".  It would be possible
                   to handle multiple DTDs (and LPDs) by nested
                   multipart/related parts, it's more convenient to use one
                   multipart/related part for each SGML parsing context, and
                   for the Content-SGML-Entity: parameters to uniquely identify
                   the entity within that parsing context.

       linktype    A string specifying the link type name of the LPD subset in
                   which the entity was declared, if the entity was declared in
                   an LPD subset.  This parameter would not be present for
                   entities with a decl-type other than "general" or "parame-
                   ter".




       Levinson (Editor)   Expires 6 months from issue date            [Page 4]

       Internet Draft                                                 MIME-SGML


       public-id   The public identifier in the entity's declaration; for an
                   entity not in a document, the public identifier of the
                   entity.

       notation-nameThe notation name of an external entity.  Not valid in a
                   notation-header.  The value of this parameter will be the
                   value of the name parameter of a Content-SGML-Notation
                   header.

       extension-token
                   A parameter not defined in this document and agreed upon by
                   the parties using it, a group of consenting adults.

       4.2     The Content-SGML-Entity Header

            entity-header := "Content-SGML-Entity" ":" decl-type
                            *( ";"  parameter )

            decl-type     := "doctype" / "linktype" / "general" /
                            "parameter"

       Decl-type is a token specifying how the entity was declared:

       general   An entity declared in a entity declaration as a general entity

       parameter An entity declared in an entity declaration as parameter
                 entity.

       doctype   An entity containing an external DTD subset, declared by a
                 doctype declaration; the name in this case would be the docu-
                 ment type name.

       linktype  An entity containing an external LPD subset, declared in a
                 linktype declaration; the name in this case would be the link
                 type name.

       4.3     The Content-SGML-Notation Header

       A notation-header will appear in each NDATA external entity and it pro-
       vides the information from the corresponding notation declaration.
       These headers may be repeated as several entity declarations may have
       the same notation name.

            notation-header := "Content-SGML-Notation" ":"
                            *( ";"  parameter )

       5       Partial or Incomplete Documents

       6       SDIF?

       7       References


       [ISO-SGML]  ISO 8879:1988, Information processing -- Text and office



       Levinson (Editor)   Expires 6 months from issue date            [Page 5]

       Internet Draft                                                 MIME-SGML


                   systems -- Standard Generalized Markup Language (SGML).


       [ISO-SDIF]  ISO 9069:1988, Information Processing - SGML Support Facili-
                   ties -- SGML Document Interchange Format (SDIF).


       [RFC-822]   Crocker, D., Standard for the Format of ARPA Internet Text
                   Messages, August 1982, University of Delaware, RFC 822.


       [RFC-HDRC]  Moore, Keith, Representation of Non-Ascii Text in Internet
                   Message Headers, June, 1992, RFC 1342


       [RFC-MIME]  Borenstein, N. and Freed, N., MIME (Mulitpurpose Internet
                   Mail Extensions): Mechanisms for Specifying and Describing
                   the Format of Internet Message Bodies, June 1992, RFC 1341.


       [US-ASCII]  Coded Character Set -- 7-Bit American Standard Code for
                   Information Interchange, ANSI X3.4-1986.


       8       Acknowledgements

       The editor has borrowed freely from the suggestions of others and in
       particular lifted text from James Clark and ideas from Roy Fielding.

       The editor also acknowledges Terry Allen, O'Reilly & Associates, Harald
       T. Alvestrand, UniNett, Nathaniel Borenstein, First Virtual Holdings
       Incorporated, Daniel W. Connolly, W3O, Steven DeRose, EBT, Andy Gelsey,
       CSC, Paul Grosso, ArborText, John Klensin, MCI, Einar Stefferud, Network
       Management Associates, Inc, and Erik Naggum, for their suggestions,
       explanations, and encouragement.  No errors or faults in this document
       can be ascribed to them, those are mine.

       UNIX is a registered trademark of UNIX System Laboratories, Inc.

       9       Author's Address

       Ed Levinson elevinson@accurate.com Accurate Information Systems, Inc.  2
       Industrial Way Eatontown, NJ  0772














       Levinson (Editor)   Expires 6 months from issue date            [Page 6]

       Internet Draft                                                 MIME-SGML


       Appendix A.   SGML for IETFers

       This is a description of the elements of the Standard Generalized Markup
       Language (SGML) that are key to understanding the relationship between
       SGML and the Multipurpose Internet Mail Extensions (MIME).  For the pur-
       poses of this discussion, and without doing too much damage to the SGML
       specification, an SGML document contains text, markup, and references to
       non-text document elements (graphics).  For a complete and accurate
       description see ISO 8879, Information Processing - Text and office sys-
       tems - Standard Generalized Markup Language (SGML).

       An SGML document has the following structure (the parenthesized numbers
       refer to productions in ISO 8879) and is processed by an application
       called an SGML parser.  Note that Internet style ABNF is used for nota-
       tion here, SGML uses a different style.

               sgml-doc        :=      sgml-decl prolog doc-inst       (2)
               sgml-sub-doc    :=      dtd doc-inst                    (3)

       Sgml-decl defines the various elements and parameters of SGML.  For
       example, the characters that introduce and end markup tags, "<" and ">"
       respectively will be used here, the maximum length of markup tags, etc..

       The prolog defines the document structure, usually through an SGML con-
       struct called the document type definition (DTD).  Most importantly for
       interchange considerations, the DTD contains references to external
       files, system commands, and text to be sent directly to a typesetter or
       printer.

       Doc-inst is the actual document instance or text; it also includes
       graphic elements, other text with or without markup, by reference to DTD
       elements.

       The remainder of this discussion focuses on two elements which a DTD
       references, entities and notations. They appear in the DTD and have the
       following format.

               entity   := "<!" "ENTITY" name e-text ">"               (101)
               e-text   := q-string | data | b-text | external (105)
               data     := ( "CDATA" | "SDATA" | "PI" ) q-string       (106)
               external := ext-id
                               [ ( "SUBDOC" | ( "NDATA" type ) ) ]     (108)
               ext-id   := ( "SYSTEM"  q-string)
                               | ( "PUBLIC" pub-id [q-string] )         (73)
               notation :=     "<!" "NOTATION" type ext-id ">"         (148)

       where name is a character string and the definition of b-text left to
       ISO 8879; for convenience q-string has been substituted for the SGML
       term parameter literal.  Entities referred to via the SUBDOC keyword
       differ from SGML documents in that they only contain a DTD and a doc-
       inst.

       Using the above productions the following simple example entities demon-
       strate the important issues. Name and type are alphanumeric tokens and



       Levinson (Editor)   Expires 6 months from issue date            [Page 7]

       Internet Draft                                                 MIME-SGML


       q-string is a series of characters enclosed in double (or single) quote
       marks.

               <!ENTITY name PUBLIC pname>                     (A)
               <!ENTITY name SYSTEM fname>                     (B)
               <!ENTITY name SYSTEM fname NDATA type>          (C)
               <!NOTATION type SYSTEM command>                 (D)
               <!ENTITY name PI q-string>                      (E)

       Form A refers to a well known or "public" name that the SGML parser is
       able to resolve; in the marked up text there will be a markup item
       <name> that directs the parser to include the corresponding public file.
       Similarly, form B corresponds to a locally known file.  Form C allows
       the markup text to refer to non-SGML data, an image for example, and the
       type parameter must match the type of a NOTATION element .  The matching
       element's command parameter specifies the command which processes the
       file fname.  Finally form E, processing instructions, specifies a string
       of characters to be sent directly to the output device.

       These examples give rise to the following issues when the document is
       transferred from one environment to another.


       A    Is the public name known to the recipient?  The recipient SGML
            parser may not know of the public file and this will be discovered
            when it processes the document.


       B    What is the file name on the recipient system?  There must be some
            process which binds the sender's file names to the recipient.


       C    See B and D.


       D    Direct use of the NOTATION form is a large security risk, an invi-
            tation to a Trojan Horse attack.  The recipient must be protected
            from a sender invoking an arbitrary command on the recipient sys-
            tem.


       E    Processing instructions permit the sender to manipulate the reci-
            pient output device.  This is the same risk that exists for
            PostScript documents and is not addressed.

       Issues A through D are addressed in this document.











       Levinson (Editor)   Expires 6 months from issue date            [Page 8]

       Internet Draft                                                 MIME-SGML


       Appendix B.  Content-Type registrations
       _________________________________

       The Application/SGML Content-Type


       (1)  MIME type name: Application


       (2)  MIME subtype name: SGML


       (3)  Required parameters: none


       (4)  Optional parameters: SGML-version, created-with, charset


       (5)  Encoding considerations: may be encoded


       (6)  Security considerations: none


       (7)  Specification:

            This subtype is used for text marked with the Standard Generalized
            Markup Language [ISO SGML].

       _________________________________

       The Text/SGML Content-Type


       (1)  MIME type name: Text


       (2)  MIME subtype name: SGML


       (3)  Required parameters: none


       (4)  Optional parameters: SGML-version, created-with, charset


       (5)  Encoding considerations: may be encoded


       (6)  Security considerations: none


       (7)  Specification:




       Levinson (Editor)   Expires 6 months from issue date            [Page 9]

       Internet Draft                                                 MIME-SGML


            This subtype is used for text marked with the Standard Generalized
            Markup Language [ISO SGML].

       _________________________________





















































       Levinson (Editor)   Expires 6 months from issue date           [Page 10]