MIMESGML Working Group D. Stinchfield INTERNET-DRAFT EBT, Inc. Expires May 31, 1996 December 1, 1995 Using Catalogs and MIME to Exchange SGML Documents Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to the HTTP working group at . Discussions of the working group are archived at ftp://ftp.naggum.no/pub/SGML-internet. Abstract This draft proposes a standard for exchanging SGML documents over the World Wide Web using catalogs and MIME. This draft extends SGML Open's definition of catalogs [10] by adding to it new keywords and storage object identifier (SOI) types. The new keywords identify SGML document objects (such as document type declarations and document entities), non- SGML document objects (such as stylesheets), and management information (such as base URL, character encoding, and character repertoire). The new SOI types include URIs and MIME Content-IDs. This document also describes a new MIME content type called Application/SGML-Catalog which identifies a MIME body part as a catalog. Revision History Changed Application/Catalog to Application/SGML-Catalog, makes more sense. Changed the syntax of Notation and added more to the description Added ENCODING and CHARREP keywords. Added host name suffix to Contend-ID values in examples. Removed the keywords: CHARSET, BASESET, and CAPACITY. Will assume that the public id's used to define them in the SGML declaration will be unique within the catalog. Changed BASEURL to BASE and added to the definition so that BASE can now be either an absolute URL or an absolute filename. Added a better description of how system identifiers are to be handled. TR9401:1995 is being referenced instead of TR9401:1994. User defined keywords are no longer required to begin with "X-" or "x-", but it is strongly recommended. Noted how parameter entities are to be treated. Added appendix D, public identifiers for Notations. Added a new keyword called LinkType. 1. Introduction 4 1.1 Overview 4 1.2 SGML document Components 5 2. Catalog Description 5 2.1 Resolving System Identifiers 6 2.2 Parameter Entity Names 6 2.3 Catalog Keywords 6 2.3.1 SGMLDECL 7 2.3.2 DOCUMENT 7 2.3.3 DOCTYPE 7 2.3.4 PUBLIC 8 2.3.5 ENTITY 9 2.3.6 NOTATION 9 2.3.7 LINKTYPE 10 2.3.8 SEMANTICS 10 2.3.9 BASE 10 2.3.10 OVERRIDE 11 2.3.11 ENCODING 11 2.3.12 CHARREP 11 2.3.13 User Defined Keywords 11 2.4 Storage Object Identifiers 12 2.4.1 URIs as SOIs 12 2.4.2 The Content-ID SOI 12 3. Export Catalog Syntax 13 4. Application/SGML-Catalog 15 5. Examples 16 5.1 Sending Only A Catalog 16 5.1.1 MIME Message Content 17 5.2 Sending a Catalog and the Document Entity 17 5.2.1 MIME Message Content 18 5.3 Sending a Catalog and All Document Components 18 5.3.1 MIME Message Content 18 5.4 Sending a Catalog for a Single Non-Document Entity 20 6. Security Considerations 21 7. Acknowledgments 21 8. References 21 9. Authors' Address 22 10. Appendix A: SGML declaration Used In The Examples 23 10.1 SGML declaration 23 11. Appendix B: DTD Used In The Examples 25 11.1 DTD 25 12. Appendix C: SGML document Used In The Examples 26 12.1 SGML document entity 26 12.2 Entity Named "Legal" 26 12.3 Entity Named "MyEnding" 26 13. Appendix D: NOTATIONS 27 13.1 Usefule Notations 27 13.2 ISO 9070 Public Identifiers for NOTATIONs 27 1. Introduction This draft proposes a standard for exchanging SGML documents over the World Wide Web using catalogs and MIME. This draft extends SGML Open's definition of catalogs [10] by adding to it new keywords and storage object identifier (SOI) types. The new keywords identify SGML document objects (such as document type declarations and document entities), non- SGML document objects (such as stylesheets), and management information (such as base URL, character encoding, and character repertoire). The new SOI types include URIs and MIME Content-IDs. This document also describes a new MIME content type called Application/SGML-Catalog which identifies a MIME body part as a catalog. SGML catalogs were introduced in the SGML Open document entitled "Entity Management" [10] (also referred to as TR9401). Catalog entries, as described in the SGML Open document, provide a mapping of PUBLIC external identifiers and entity names to system-dependent SOIs (these system-dependent SOIs are typically filenames, see section 2.2 for details). This system-dependent requirement is too restrictive and fails to meet the needs of the internet community. Specifically, SGML Open's catalog definition does not provide keyword entries for all the types of external objects used in SGML and it does not define how to map catalog entries to system-independent SOIs such as URLs. This document addresses both of these limitations by extending SGML Open's catalog definition to include keywords for SGML external identifiers and by defining system-independent SOIs. Some key benefits to using catalogs and MIME to exchange SGML documents are: o a client only needs a catalog to begin processing, it simply fetches the components referenced in the catalog as they are needed; o a client that understands catalogs has a way to fetch components of a document that it doesn't already have; o document components do not have to be modified in order to be referenced in a catalog; o components of a document can be distributed across many servers; o catalogs do not depend on MIME, therefore, they can be used in other packaging schemes; o the impact on MIME is minimized; o catalogs are an implemented, proven technology; o a document's system identifiers can be referenced in a catalog and subsequently resolved by a client. 1.1 Overview The new catalog keywords defined in this document are NOTATION, BASE, SEMANTICS, OVERRIDE, ENCODING, and CHARREP. NOTATION came from Charles Goldfarb's paper entitled "Entity Management in SGML" [11] and refers to the SGML NOTATION declaration (refer to [3], page 426 for a description of NOTATION). The others are described for the first time in this document: BASE is used to resolve relative URLs found in a catalog; SEMANTICS is used to reference semantic processing information, such as stylesheets; OVERRIDE indicates which TR9401 processing mode should be used to resolve external identifiers; ENCODING describes the encoding of the catalog entries; and CHARREP describes the character repertoire (refer to [3] page 193 for a detailed description of character repertoire) of the catalog entries. User defined keywords are allowed in catalogs for experimental purposes. It is strongly recommended that user defined keywords begin with "X-" or "x-" and be located at the end of the catalog - this will make it easy for clients to identify user defined keywords. System-independent SOIs are defined to permit both URIs and MIME Content-IDs. The usefulness of URIs is evident from the popularity of the World Wide Web refer to [7] [8] [12] and [4] for detailed descriptions of URIs, URNs, and URLs. An SOI that is defined to be a MIME Content-ID identifies a document component that is contained in a MIME body part (refer to [13] for a description of MIME Content-IDs). A catalog that contains a MIME Content-ID SOI is typically part of a multipart message and usually refers to a body part contained in the same multipart message. There are a number of ways to serve up an SGML document using catalogs and MIME. The server could deliver just a catalog; the server could deliver a catalog and a document entity; the server could deliver a catalog and all of the document's components; the server could deliver a catalog and a non-document entity - detailed examples of all of these can be found in section 5. Any catalog served using MIME will have a Content-Type of Application/SGML-Catalog. 1.2 SGML document Components SGML documents are typically made up of a number of components. Some of the components contain instructions to the SGML parser (such as the SGML declaration and the DTD) while others contain marked up text (such as the SGML document entity), and still others contain non-SGML data (such as figures). These components can be identified in a number of different ways using what SGML calls external identifiers. Catalog entries, among other things, identify SGML document components. See ISO 8879:1986 [14] and to Charles Goldfarb's "The SGML Handbook" [3] for a rigorous description of SGML. 2. Catalog Description A catalog provides a mapping between external object identifiers (such as public identifiers and entity names) to SOIs. In TR9401 [10] an SGML system's entity manager typically treats SOIs as system-dependent filenames. This is too restrictive for SGML Systems that need to take advantage of the Internet. Formally extending the meaning of SOIs to include Universal Resource Identifiers (URI, [7]) removes this system- dependent restriction. Catalogs that contain these types of SOIs are system-independent and so give flexibility to SGML systems. To interchange an SGML document means to send a catalog and a set of zero or more components to a client (note, the interchange package could contain all of the document's components). The catalog contains references to document components (refer to section 2.1 for a detailed list of keywords). 2.1 Resolving System Identifiers Entity references that are declared with system identifiers in an SGML document can be referenced in a catalog. The logic that creates the catalog should set OVERRIDE to "YES" to indicate to the receiving systems that the catalog should be used to resolve external identifiers whether or not there is a system identifier defined for the external identifier in the document. If the URL in the catalog is relative then it is resolved with respect to the BASE catalog entry. If there's no BASE entry then the relative URL is resolved with respect to the URL of the catalog. What happens if there is no catalog entry for the entity? If there is a system identifier for it in the document then the system identifier is used to resolve the reference. If the system identifier is relative then it is resolved with respect to the entity in which the entity declaration is specified. 2.2 Parameter Entity Names Parameter entity names always begin with a percent sign (%) as defined in TR9401: "Note that, if the entity name is a parameter entity name (as opposed to a general entity name), an initial percent sign(%), is part of the name. (The percent sign- which is the reference concrete syntax replacement for the "PERO" character- shall be used in the catalog regardless of the concrete syntax of the current document.)" 2.3 Catalog Keywords A catalog contains entries for SGML document components. The order of the entries in the catalog is not important. All entries are optional. A catalog can contain multiple entries with the same keyword. The following keywords are defined in this document: SGMLDECL - SGML declaration DOCUMENT - SGML document entity DOCTYPE - Document type declaration (DTD) PUBLIC - public external identifier ENTITY - entity name NOTATION - notation name LINKTYPE - link type name SEMANTICS - name and type of the semantic information BASE - base URL OVERRIDE - defines which TR9401 processing mode to use CHARREP - character set ENCODING - character encoding X- or x- - user defined keyword prefix These keywords are necessary to keep the name spaces of an SGML document separate. An SGML document may contain many components, each of which can be identified by a local name (such as "chap1") or by a global "public identifier" (such as ISBN, URN, etc.). Local identifiers may be re-used for different kinds of components. For example, an SGML document could have an included entity called "footnote" and a special notation called "footnote". In a catalog these could be represented by the following entries: ENTITY "footnote" "http://blah.com/blah.foot" NOTATION "footnote "http://blah.com/notation/yuk.not" 2.3.1 SGMLDECL The SGML declaration is part of the SGML document and is required by the SGML parser before it can begin parsing. The SGML declaration defines, among many other things, the SGML document's character set and the character strings that define the markup. For an excellent description of SGML declarations refer to Wayne Wohler's paper called "SGML declarations" [1]. Appendix A of this document contains an example of an SGML declaration. If an SGML document is not explicitly associated with an SGML declaration then a default SGML declaration is assumed by the parser. The SGML declaration is identified in the catalog by the SGMLDECL keyword. An SGML declaration is not always self-contained. It can include references to public identifiers ([3], 378). The following parameters of an SGML declaration can be defined as public identifiers: BASESET ([3], 453:12); CAPACITY ([3], 456:2); and SYNTAX ([3], 458:2). This proposal assumes that the public identifiers used for these parameters will be unique within the catalog; more than likely BASESET, CAPACITY, and SYNTAX will be formal public identifiers which guarantees universal uniqueness (for a description of formal public identifiers refer to [3] pg 381). The SGML declaration catalog keyword has the following syntax: sgmldecl = ("SGMLDECL", ps+, storage object identifier) 2.3.2 DOCUMENT The DOCUMENT catalog keyword refers to the SGML document entity ([3], 142:1). The SGML document entity describes the first entity of the SGML document ([3], 142:1). It typically contains a reference to a DTD, a document type declaration subset ([3], 404:6), and marked-up text. Appendix C of this document contains an example of an SGML document entity. The syntax for DOCUMENT follows: DOCUMENT = ("DOCUMENT", ps+, storage object identifier) 2.3.3 DOCTYPE The DOCTYPE catalog keyword refers to the Document type declaration (DTD) ([3], p402). The syntax for DOCTYPE is: doctype = ("DOCTYPE", ps+, document type name, ps+, storage object identifier) The value of the "document type name" is defined in the SGML document entity, for example the first line in the Document entity used in the examples looks like this: A catalog entry for it would look something like this: DOCTYPE "MEMO" "../../dtds/memo.dtd" Note how the name, "MEMO", is defined in the document and referenced in the catalog. The entity manager resolves a reference to the DTD by looking its name up in the catalog. Appendix B of this document contains an example of a DTD. It is not uncommon for the document's DTD to have both a public identifier and a system identifier: ... For these cases the server decides whether or not to include both in the catalog. If it decides to include both then the catalog entries for them would look something like this: DOCTYPE "MEMO" "http:/www.bill.com/usr/wcs/dtds/memo.dtd" PUBLIC "-//EBT//DTD Released Memo//EN" "http://www.yoman.edu/pub/dtds/memo.dtd" The client decides which definition to use first. 2.3.4 PUBLIC PUBLIC catalog keywords refer to public identifiers that are defined in the DTD or in the Document entity's "document type declaration subset". The syntax for the PUBLIC keyword is: public = ("PUBLIC", ps+, public identifier, ps+, storage object identifier) Here's an example of a parameter entity declaration that contains a public identifier: a catalog entry for the above public identifier might look like this: PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" "http://www.wcs.com/usr/wcs/isonum.ent" 2.3.5 ENTITY The ENTITY catalog keyword refers to document entities that are defined/referenced in the document but have no public identifiers. The syntax for the ENTITY keyword is: entity = ("ENTITY", ps+, entity name, ps+, storage object identifier) The following are examples of entity declarations for system identifiers: and Entries in a catalog for these would look something like this: ENTITY "Legal" "http://www.bill.com/company/legal.sgm" ENTITY "MyEnding" "http://www.bill.com/ending.sgml" 2.3.6 NOTATION The NOTATION catalog keyword refers to data content notations defined/referenced in the document. The syntax for NOTATION is (note, the storage object identifier is optional for NOTATION): notation = ("NOTATION", ps+, notation name [ ,ps+, storage object identifier ] ) The following example illustrates how NOTATION could be used for Java scripts: input to JuggleBalls script, for example: specify number of items and juggling style. A catalog entry for the NOTATION declaration described above would look like: NOTATION "JuggleBalls" "http://www.bill.com/juggleballs.java" Processing of notations are system dependent - there's no way for a server to guarantee that a client can process a specific notation. The NOTATION keyword in the catalog may only give a hint, and possibly a pointer to a notation processor. Note, it can be dangerous for the client to resolve a reference to a notation processor - loading and running a notation processor (remote script) from a remote, and potentially unsecured site is dangerous! However, in a secure environment exchanging scripts may be perfectly safe. Appendix D describes a collection of public identifiers for NOTATIONS as described by ISO 9070. These are not yet approved by ISO, but hopefully they will be soon. Unfortunately ISO 9070 does not define public identifiers for some of the more popular NOTATIONS found on the Web such as Postscript, TeX, and Tcl. Formal Public Identifiers for these can also be found in Appendix D. 2.3.7 LINKTYPE The syntax for LinkType is: linktype = ("LINKTYPE", ps+, link type name, ps+, storage object identifier) For a detailed description of what an SGML linktype is refer to [14] and [3]. 2.3.8 SEMANTICS There may be semantic information, such as stylesheets, associated with a document. Semantic information is not required for parsing the document and can be ignored by the client. However, it is often required that a client be able to access appropriate semantic specifications. The syntax for the SEMANTICS keyword is: semantics = ("SEMANTICS", ps+, semantic name, ps+, semantic type, ps+, storage object identifier) Here's an example of an entry in a catalog for semantic information: SEMANTICS "large-print" "DSSSL" "http://www.bill.com/style/large.sty" 2.3.9 BASE Relative URLs [4] are allowed in SOIs. Relative URLs can be resolved using the BASE keyword catalog entry. If there's no BASE in the catalog then the URL for the catalog is used for relative URL resolution. The syntax for the BASE keyword is: base = ("BASE", ps+, absoluteURL) Here's an example of an entry in a catalog for BASE: BASE "http://www.bill.com/docs/memo/mine/dummy" 2.3.10 OVERRIDE The OVERRIDE keyword defines which TR9401 processing mode to use when the SGML system is resolving external identifiers with explicit system identifiers. When OVERRIDE is "YES" then the receiving system should use the catalog to resolve external identifiers whether or not there is a system identifier defined for it in the document. If the value for OVERRIDE is "NO" then system identifiers found in the document should be used first to resolve external identifiers. The syntax for the OVERRIDE keyword is: override = ("OVERRIDE", ps+, ("YES" | "NO")) 2.3.11 ENCODING The ENCODING keyword is specified as: encoding = ("ENCODING", ps+, alphanumeric+) The ENCODING keyword indicates the encoding of proceeding catalog entries. There can be more than one ENCODING entry in a catalog. When an ENCODING entry is found it supersedes the value of any preceding ENCODING entry. For example, the following catalog describes catalog entries that have different encodings: ENCODING "SHIFT-JIS" DOCUMENT "http://www.goeast.com/anaxi.sgm" ENCODING "ISO-10646-UTF7" DOCTYPE "MEMO" "http://www.gowest.com/dtds/memo.dtd" ENCODING "SHIFT-JIS" ENTITY "MyEnding" "http://www.goeast.com/ending.sgml" ENTITY "Legal" "http://www.goeast.com/company/legal.sgm" Here the document entity and the entities MyEnding and Legal are encoded in SHFIFT-JIS while the DTD is encoded in ISO-10646-UTF7. 2.3.12 CHARREP The CHARREP keyword specifies the character repertoire(s)for the catalog entries that follow it. The syntax of the CHARREP keyword is: charrep = ("CHARREP", +(ps+, alphanumeric+) ) Like ENCODING, the CHARREP keyword stays in effect until a new CHARREP keyword is detected (the values in the new CHARREP replace the values of the previous CHARREP entry). Note, there can be more than one value in a CHARREP entry. For example: CHARREP "JIS 0208" "JIS 0201" 2.3.13 User Defined Keywords A user can create new catalog keywords by beginning the keyword with either an "X-" or an "x-". Users may do this to test experimental keywords. User defined keywords are only allowed at the end of the catalog and must begin with either the "X-" or the "x-" prefix. 2.4 Storage Object Identifiers As described in TR9401 [10], an SOI "is expected to be a string that is assumed to make sense to the operating system involved, i.e., it should name a file accessible from the current file system" [10]. TR9401 anticipated the extension of SOIs to define "a different or extended meaning that will require the recognition and special processing of certain characters in the SOI." [10]. Two such SOI extensions are defined in this section. The first defines SOIs in terms of URIs and the second defines them in terms of a MIME Content-ID. The latter type of SOI is used to identify the body part of a Multipart/Related message. Content-ID referencing is used when a single Multipart/Related message contains the document's catalog and one or more of the document's components (see 5.2 and 5.3 for examples). The syntax for an SOI is: storage object identifier = uri object identifier | content id object identifier | TR9401 storage object identifier The term "TR9401 storage object identifier" refers to the TR9401's definition of an SOI and is included here for backwards compatibility. 2.4.1 URIs as SOIs URIs are used to describe the names and locations of objects. Uniform Resource Locators (URL) and Uniform Resource Names (URN) are examples of URIs (refer to [8] and [12] for rigorous descriptions of URLs and URNs). Generally speaking URLs have the following structure: scheme:scheme-specific-part A scheme is associated with a protocol such as http or ftp (refer to [8] for a list of schemes supported by a URL). The scheme-specific-part contains the information required by the scheme to locate the object. "The purpose or function of a URN is to provide a globally unique, persistent identifier used for recognition, for access to characteristics of the resource or for access to the resource itself." [12] There is no internet standard defined for URNs yet, but one is anticipated soon. 2.4.2 The Content-ID SOI In addition to the catalog a server can send some/all of the document's components in a single MIME message. Content-ID SOIs are used to map entries in the catalog to the body part of the MIME message that contains the corresponding component. See the examples in 5.2 and 5.3. The syntax for Content-ID based SOIs is: content id object identifier = "Content-ID" ":" msg-id ; as defined in RFC 1521 [13] msg-id = "<" addr-spec ">" ; as defined in RFC 822 [15] ; Unique message id addr-spec = local-part "@" domain ; as defined in RFC 822 [15] ; global address local-part = word *("." word) ; as defined in RFC 822 [15] ; uninterpreted ; case-preserved domain = sub-domain *("." sub-domain) ; as defined in RFC 822 [15] sub-domain = domain-ref / domain-literal ; as defined in RFC 822 [15] domain-ref = atom ; as defined in RFC 822 [15] ; symbolic reference 3. Export Catalog Syntax export catalog = ( catalog entry, ps+ )+, ( user defined, ps+ )* catalog entry = sgmldecl | document | doctype | public | entity | notation | semantics | linktype | base | override | user defined | charrep | encoding sgmldecl = ("SGMLDECL", ps+, storage object identifier) DOCUMENT= ("DOCUMENT", ps+, storage object identifier) doctype = ("DOCTYPE", ps+, document type name, ps+, storage object identifier) public = ("PUBLIC", ps+, public identifier, ps+, storage object identifier) entity = ("ENTITY", ps+, entity name, ps+, storage object identifier) notation = ("NOTATION", ps+, notation name [, ps+, storage object identifier] ) linktype = ("LINKTYPE", ps+, link type name, ps+, storage object identifier) semantics= ("SEMANTICS",ps+, semantic name, ps+, semantic type, ps+, storage object identifier) base = ("BASE", ps+, absoluteURL) override = ("OVERRIDE", ps+, ("YES" | "NO")) charrep = ("CHARREP", +(ps+, alphanumeric+)) encoding = ("ENCODING", ps+, alphanumeric+) user defined = (("X-" | "x-"), alphanumeric+) storage object identifier = uri object identifier | content id object identifier | TR9401 storage object identifier uri object identifier = as defined in RFCs 1808[4], 1630[7], 1738[8] content id object identifier = "Content-ID" ":" msg-id ; as defined in RFC 1521 [13] msg-id = "<" addr-spec ">" ; as defined in RFC 822 [15] ; Unique message id addr-spec = local-part "@" domain ; as defined in RFC 822 [15] ; global address local-part = word *("." word) ; as defined in RFC 822 [15] ; uninterpreted ; case-preserved domain = sub-domain *("." sub-domain) ; as defined in RFC 822 [15] sub-domain = domain-ref / domain-literal ; as defined in RFC 822 [15] domain-ref = atom ; as defined in RFC 822 [15] ; symbolic reference TR9401 storage object identifier = "storage object identifier" as defined in TR9401 [10] semantic name = alphanumeric+ absoluteURL = see "absoluteURL" in RFC 1808 [4] document type name = alphanumeric+ ; From TR9401 [10] ps = s | comment LIT = `"` ; the double quote LITA = "`" ; the single quote comment = COM, system character*, COM COM = "--" entity name = extended name character+ | (LIT, extended name character+, LIT) | (LITA, extended name character+, LITA) The following notes are taken from TR9401 [10]: (1.) public identifier and s are defined in 8879 (and RS, RE, SPACE and SEPCHAR are as in the reference concrete syntax of 8879); (2.) extended name character means (a) in the case of an undelimited string, any character except the "null" character, the LIT character, the LITA character, and those characters allowed in s, and (b) in the case of a delimited literal, any character except the "null" character and the delimiting character for that literal (i.e., LIT or LITA); (3.) system character means (a) in the case of an undelimited string, any character except the "null" character, the LIT character, the LITA character, and those characters in s; (b) in the case of delimited literal, any character except the "null" character and the delimiting character for the literal (i.e., LIT or LITA); (c) in the case of a comment, any character except the "null" character and a sequence of characters that would be interpreted as the terminating COM delimiter; (d) in the case of an undelimited string the comprises the keyword and the second and subsequent argument of other information, the string must not be recognizable as the PUBLIC or ENTITY keywords. 4. Application/SGML-Catalog A new MIME Content-Type called Application/SGML-Catalog identifies a MIME body part for a catalog MIME type name: Application MIME subtype name: SGML-Catalog Required parameters: none Optional parameters: charset Encoding considerations: may be encoded Security considerations: see section 4 below Published specification: Person and email address to contact for further information: D. Stinchfield Use the Application/SGML-Catalog media-type for catalogs. The structure and use of catalogs is defined in this specification. 5. Examples The SGML document used in all the examples is composed of the following: o an SGML declaration, defined in Appendix A; o a Document type declaration (DTD), defined in Appendix B; o an SGML document entity, defined in Appendix C; o two SGML entities, defined in Appendix C; o a figure entity, not defined in this draft. In all examples the components of this SGML document are spread across multiple servers, except for the example entitled "Sending a Catalog and All of its Components", where all of the document's are contained in a single MIME message. Each example defines its own unique catalog. The catalog varies from example to example depending on the number of document components sent along with it. Remember, by definition zero or more components are sent along with the catalog. The sender decides how many components to include in the MIME message. A document component that's not included in the MIME message can be resolved by the client in one of two ways: 1.) the client has the component cached; 2.) the client requests the component using the SOI defined for it in the catalog. The definitions for the following external identifiers are not included in this document: formal public identifiers ISO 646:1983// CHARSET International Reference Version (IRV)// ESC 2/5 4/0 ISO 8879-1986// ENTITIES Numeric and Special Graphic//EN system identifier ../style/all.sty" - DSSSL style sheet Examples that contain a Multipart/Related MIME Content Type default the compound object's "root" to the first body part of the message [5] which is always of type Application/SGML-Catalog. 5.1 Sending Only A Catalog In this example only the catalog is sent to the client. If the client's SGML System is capable of handling URIs that are defined as SOIs then the newly received catalog can be passed, without modification, to the client's SGML System. If the client's SGML System cannot handle this type of SOI then the client must do some pre-processing before passing the catalog on. The pre-processing logic should do something like the following: 1. fetch all of the components referenced in the catalog, 2. store the components locally, and 3. update the catalog (change all URI based SOIs). A server may have a number of reasons why it would want/need to send only a catalog: o The server only stores catalogs, it does not store any document components; o The client may have requested only the catalog. Perhaps the client wants to compare the contents of this catalog with the contents of a different catalog. Or maybe the client already has most, if not all, of the document's components cached; o The server may want to keep network traffic down by increasing the likelihood that the client will get a cache hit on catalog entries. 5.1.1 MIME Message Content MIME-Version: 1.0 Content-Type: Application/SGML-Catalog; charset=us-ascii SGMLDECL "http://www.ebt.com/decl/ebtsgml.dcl" OVERRIDE "YES" PUBLIC "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" "http://www.iso.ch/charset/6461983.cha" PUBLIC "ISO Registration Number 100//CHARSET ECMA-94 Right-hand Part of Latin Alphabet Nr.1//ESC 2/13 4/1" "http://www.iso.ch/charset/ecma94.cha" PUBLIC "-//EBT//CAPACITY CoolCaps 1.0//" "http://www.ebt.com/decl/coolcaps.cap" PUBLIC "-//EBT//SYNTAX SinSyn 0.1//" "http://www.ebt.com/decl/syntax/sinsyn.syn" BASE "http://www.bill.com/docs/memo/mine/dummy" DOCUMENT "anaxi.sgm" DOCTYPE "MEMO" "../../dtds/memo.dtd" PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" "http://www.wcs.com/usr/wcs/isonum.ent" ENTITY "%ISOnum" "http://www.wcs.com/usr/wcs/isonum.ent" ENTITY "MyEnding" "ending.sgml" ENTITY "Legal" "../company/legal.sgm" SEMANTICS "large-print" "DSSSL" "../style/all.sty" 5.2 Sending a Catalog and the Document Entity This example describes how to send a catalog and a document entity component using a Multipart/Related message [5]. This example is the likely scenario for Web-based Browsers where simultaneous rendering and resolving of external identifiers are necessary. The document entity will likely contain enough text for the Browser to render meaningful text to the user, but it won't include the many entities that the text may link to. These external identifiers, like figures, can be resolved (fetched, that is) by the entity manager while the application is rendering the text or (as for hyperlinked information) on user demand. 5.2.1 MIME Message Content MIME-Version: 1.0 Content-Type: Multipart/Related; boundary=let-go-of-my-leg; type="Application/SGML-Catalog" --let-go-of-my-leg Content-Type: Application/SGML-Catalog; charset=us-ascii SGMLDECL "http://www.ebt.com/decl/ebtsgml.dcl" OVERRIDE "YES" PUBLIC "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" "http://www.iso.ch/charset/6461983.cha" PUBLIC "ISO Registration Number 100//CHARSET ECMA-94 Right-hand Part of Latin Alphabet Nr.1//ESC 2/13 4/1" "http://www.iso.ch/charset/ecma94.cha" PUBLIC "-//EBT//CAPACITY CoolCaps 1.0//" "http://www.ebt.com/decl/coolcaps.cap" PUBLIC "-//EBT//SYNTAX SinSyn 0.1//" "http://www.ebt.com/decl/syntax/sinsyn.syn" BASE "http://www.bill.com/docs/memo/mine/dummy" DOCUMENT "Content-ID:" DOCTYPE "MEMO" "../../dtds/memo.dtd" PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" "http://www.wcs.com/usr/wcs/isonum.ent" ENTITY "%ISOnum" "http://www.wcs.com/usr/wcs/isonum.ent" ENTITY "MyEnding" "ending.sgml" ENTITY "Legal" "../company/legal.sgm" SEMANTICS "large-print" "DSSSL" "../style/all.sty" --let-go-of-my-leg Content-Type: Application/SGML; charset=us-ascii Content-ID: include document entity from Appendix C --let-go-of-my-leg-- 5.3 Sending a Catalog and All Document Components Like the previous example, sending a catalog and all of the document's components is described using a Multipart/Related message. A server might do something like this in response to a client's request for all of the document components to be sent with the catalog. 5.3.1 MIME Message Content MIME-Version: 1.0 Content-Type: Multipart/Related; boundary=go-speed-racer; type="Application/SGML-Catalog" --go-speed-racer Content-Type: Application/SGML-Catalog; charset=us-ascii SGMLDECL "Content-ID:" OVERRIDE "YES" PUBLIC "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" "Content-ID:" PUBLIC "ISO Registration Number 100//CHARSET ECMA-94 Right-hand Part of Latin Alphabet Nr.1//ESC 2/13 4/1" "Content-ID:" PUBLIC "-//EBT//CAPACITY CoolCaps 1.0//" "Content-ID:" PUBLIC "-//EBT//SYNTAX SinSyn 0.1//" "Content-ID:" DOCUMENT "Content-ID:" DOCTYPE "MEMO" "Content-ID:" PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" "Content-ID:" ENTITY "%ISOnum" "Content-ID=" ENTITY "MyEnding" "Content-ID:" ENTITY "Legal" "Content-ID:" SEMANTICS "large-print" "DSSSL" "Content- ID:" --go-speed-racer Content-Type:Application/SGML; charset=us-ascii Content-ID:"" description of SGML declaration from Appendix A is included here --go-speed-racer Content-Type:Application/SGML; charset=us-ascii Content-ID:"" ISO 646 character set definition included here --go-speed-racer Content-Type:Application/SGML; charset=us-ascii Content-ID:"" description of Capacity from Appendix A is included here --go-speed-racer Content-Type:Application/SGML; charset=us-ascii Content-ID:"" description of Syntax from Appendix A is included here --go-speed-racer Content-Type:Application/SGML; charset=us-ascii Content-ID:"" Contents of ISO Registration Number 100//CHARSET ECMA-94 Right-hand Part of Latin Alphabet Nr.1//ESC 2/13 4/1 included here --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: include Document entity as described in Appendix C --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: include DTD as described from Appendix B --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: ISO 8879-1986 Entity set included here --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: include entity set defined for %ISOnum --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: include entity MyEnding as described in Appendix C --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: include entity Legal as described in Appendix C --go-speed-racer Content-Type: Application/SGML; charset=us-ascii Content-ID: included here is a bunch of DSSSL-Lite --go-speed-racer-- 5.4 Sending a Catalog for a Single Non-Document Entity This example describes what a server may send in response to a request for a non-document entity. All of the previous examples assume that the original request was for the document entity. SGML documents can get very deep and have a large number of external identifiers referenced in it. Likewise, the complete catalog for a document could also get very large (a "complete catalog" contains all of the external identifiers referenced in all of the document's entities). There is no reason why the complete catalog has to be sent with the document entity. All that's required are enough entries in the catalog for the client system to resolve references declared in the entity being transferred. For example, Appendix C.2 defines an entity called "Legal" which includes a reference to an entity called "MyEnding". A request for "Legal" would result in a Multipart/Related message that looks like this: MIME-Version: 1.0 Content-Type: Multipart/Related; boundary=let-go-of-my-leg; type="Application/SGML-Catalog" --let-go-of-my-leg Content-Type: Application/SGML-Catalog; charset=us-ascii BASE "http://www.bill.com/docs/memo/mine/dummy" OVERRIDE "YES" ENTITY "Legal" "Content-ID:" ENTITY "MyEnding" "ending.sgml" --let-go-of-my-leg Content-Type: Application/SGML; charset=us-ascii Content-ID: include entity "Legal" from Appendix C.2 --let-go-of-my-leg-- 6. Security Considerations SGML documents, like other compound documents, may contain entities whose media-types present security concerns, e.g. Application/PostScript. Further, SGML may contain explicit processing instructions for a presentation or composition system; use of such instructions present concerns similar to those of Application/PostScript. The use of active media-types with Notation declarations can provide an opportunity for the sender to execute a script or other code on the recipient's machine. 7. Acknowledgments Thanks go to Andre Alguero, Steve DeRose, Chris Maden, Gavin Nicol, and Bill Smith here at EBT for helping me with the content and structure of this document. Thanks to Ed Levinson for the many discussions and debates that helped me to clarify (I hope) many of the ideas contained in this document. Thanks also go out to Wayne Wohler of IBM for his help on SGML declarations, a most confusing topic. Thanks to James Clark for his help on general SGML issues. Thanks to Martin Bryan for his help on Notations. 8. References [1] Wayne Wohler, "SGML declarations", http://www.sil.org/sgml/wlw11.html [2] Eric van Herwijnen, "Practical SGML", Second Edition, Kluwer Academic Publishers, 1994, ISBN 0-7923-9434-8 [3] Charles F. Goldfarb, "The SGML Handbook", Oxford University Press, 1994, ISBN 0-19-853737-9 [4] R. Fielding, "Relative Uniform Resource Locators", RFC 1808 [5] E. Levinson, "The MIME Multipart/Related Content-Type", RFC ???? [6] Daniel W. Connolly, HTML 2.0 SGML declaration found at http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html.decl [7] T. Berners-Lee, "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", RFC 1630 [8] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738 [10] Paul Grosso, "Entity Management", SGML Open Technical Resolution 9401:1995 (Amendment 1 to TR9401) [11] Charles F. Goldfarb, "Entity Management in SGML", 11/30/93 [12] Sollins, K. and Masinter, L., "Functional Requirements for Uniform Resource Names", RFC 1737 [13] N.Borenstein, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521 [14] "ISO 8879:1986 Information processing - The and office systems - Standard Generalized Markup Language (SGML)", Geneva, 15 October 1986 [15] D. H. Crocker, "Standard for the Format of ARPA Internet Text Messages", RFC 822 9. Authors' Address Don Stinchfield Electronic Book Technologies, Inc. One Richmond Square Providence, RI 02906 (401) 421-9550 x280 des@ebt.com 10. Appendix A: SGML declaration Used In The Examples This Appendix contains the definitions for the SGML declaration, for the CAPACITY parameter, and for the SYNTAX parameter. The SGML declaration is a modified version of the one used for HTML 2.0 [6] - I changed the CAPACITY and SYNTAX declarations so that they referenced public identifiers. The following external identifiers are reference in the SGML declaration: o BASESET "ISO 646:1983// CHARSET International Reference Version (IRV)// ESC 2/5 4/0" o BASESET "ISO Registration Number 100// CHARSET ECMA-94 Right-hand Part of Latin Alphabet Nr.1// ESC 2/13 4/1" o CAPACITY PUBLIC "-//EBT//CAPACITY CoolCaps 1.0//" o SYNTAX PUBLIC "-//EBT//SYNTAX SinSyn 0.1//" 10.1 SGML declaration 11. Appendix B: DTD Used In The Examples The DTD listed below is a modified version of the one found on page 33 of Eric van Herwijnen's book called "Practical SGML" [2]. The following external identifier is used in the DTD: The above definition is for a parameter entity and it contains both a public identifier and a system identifier. The examples have both in the catalog. 11.1 DTD %ISOnum; 12. Appendix C: SGML document Used In The Examples The SGML document defined in this appendix is broken up into 3 parts: an SGML document entity and two SGML Entities. The SGML document entity contains references to external identifiers in the DOCTYPE and ENTITY declarations: o This one contains both a public identifier and a system identifier: o This ENTITY declaration has system identifier and a system identifiers parameter: o This one specifies a system identifier without specifying a system identifier parameter (this is provided for in the SGML Standard for implementors that want to resolve System Identifiers from the entity name alone [3, p378]): 12.1 SGML document entity ] > Anaximander Cool Papa Shad &Legal;

Yo Anax, you've got a bizarre name!

&MyEnding;

12.2 Entity Named "Legal"

If you or anyone you know tries to read this email then you're in really big trouble!

You know this is the end of the document when you see &MyEnding;

12.3 Entity Named "MyEnding" Regards, Don 13. Appendix D: NOTATIONS 13.1 Usefule Notations For TeX I have taken the ISBN from Donald Knuth's book "The TeXbook" to create the Formal Public Identifier: DVI??? For PostScript I have taken the ISBN number for the PostScript reference manual to create a Formal Public Identifier: For Tcl I have taken the ISBN number from John K. Ousterhout's book "Tcl and the Tk Toolkit" to create a Formal Public Identifier: A. INTERNET-DRAFT Using Exportable Catalogs and MIME 11/22/95 21 Stinchfield [Page 31] Stinchfield [Page 1]