Abstract: "This paper traces the history of the Text Encoding Initiative, through the Vassar Conference and Poughkeepsie Principles to the publication, in May 1994, of the Guidelines for Electronic Text Encoding and Interchange. The authors explain the types of questions that were raised, the attempts made to resolve them, the TEI project's aims, the general organization of the TEI committees, and they discuss the project's future."
Abstract: Parameter entities were once thought to be the domain of only DTD designers. Parameter entities, and their references, can also be placed in the internal DTD subset of document instances. By doing so, authors can indirectly include shared entity declarations or collections of entity declarations. Such indirection can enable groups of authors to share and reuse entities that change frequently. Whereas parameter entities enable entity sharing and reuse, Hytime content location addressing can provide granular reuse of elements within file entities. When combined, paramater entities and content location addressing can enable sharing and reuse of SGML components in either local and far-flung environments."
"The scenario [discussed] is not fictitious; the problems are real and the requirements and objectives are quite common for Company X, as they are for many organizations, large and small. Of course, this paper was written in the referenced DTD and uses all of the features discussed; the SGML markup for this document is available from publish@ibm.net. The creative and judicial use of the features described in this paper provide a reasonable degree of reuse and data management across an organization of virtually any size, without requiring the use of an SGML-enabled data manager. However, a capable SGML-enabled data manager, combined with one or more of these features, can provide an organization with a formidable, extensible, and highly automated reuse environment."
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
The volume apparently has not yet been published (November 1995).
Summary: "Beyond the need to query and retrieve based on tags which exist in a TEI document, a means to manipulate and query classes of objects is also desirable. The TEI DTD uses SGML entity definitions to create "classes" of elements and attributes, in particular, for groups of elements with common structural properties (e.g., all elements that can appear between paragraphs), groups of attributes which apply to certain classes of elements (e.g., attributes for pointer elements), etc. In addition to grouping together elements and attributes with common structural properties, the definition of such classes recognizes common semantic properties among elements and attributes. However, the SGML entity definition mechanism provides only for string substitution within the DTD itself, thereby enabling easy reference to these classes in later element definitions; the common semantic properties that are implicit in the classification scheme are lost for the purposes of retrieval and document manipulation. Obviously, a means to refer to and manipulate classes of elements and attributes in a query and retrieval system would provide substantial additional power for the user."
"We are experimenting with the representation of a DTD and associated documents (i.e., documents conformant to the DTD) in a knowledge representation (KR) system, in order to provide more sophisticated query and retrieval from TEI documents than current systems provide. We are using CLASSIC, a frame-based representation system developed at AT&T Bell Laboratories . Like many KR systems, CLASSIC enables the definition of structured concepts/frames, their organization into taxonomies, the creation and manipulation of individual instances of such concepts, and inference such as inheritance, relation transitivity, inverses, etc. In addition, CLASSIC provides for the key inferences of subsumption and classification. By representing a document as an individual instance of a hierarchy of concepts derived from the DTD, and by allowing the creation of additional user-defined concepts and relations, sophisticated query and retrieval operations can be performed. This paper briefly describes the CLASSIC system, the representation of a DTD and a document conforming to that DTD in CLASSIC, and provides an overview of the kind of query and retrieval that can be performed.
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/ide.html; [local archive copy]. Also on the Vassar server: http://www.cs.vassar.edu/~ide/papers/tei10.html See the main database entry for additional information about the conference, or the Brown University web site.
Abstract: "This article describes the major problems in devising a TEI encoding format for dictionaries, which, because of their high degree of structuring and compression of information, are among the most complex text types treated in the TEI. The major problems for this task were: (1) the tension between generality of the description, in order to be widely applicable across dictionaries, and descriptive power, that is, the ability to describe with precision the particular structure of any given dictionary; and (2) the need to accommodate different views and uses of the encoded dictionary, for example, as printed object and as a database of information."
"Abstract: MULTEXT (Multilingual Text Tools and Corpora) is the largest project funded in the Commission of European Communities Linguistic Research and Engineering Program. The project will contribute to the development of generally usable software tools to manipulate and analyse text corpora and to create multi-lingual text corpora with structural and linguistic markup. It will attempt to establish conventions for the encoding of such corpora, building on and contributing to the preliminary recommendations of the relevant international and European standardization initiatives. MULTEXT will also work towards establishing a set of guidelines for text software development, which will be widely published in order to enable future development by others. All tools and data developed within the project will be made freely and publicly available."
The goals of MULTEXT are the creation of "reusable software for multi-lingual linguistic corpus annotation and exploitation; software standard for tool design; TEI-based markup standard for corpus encoding; multi-lingual corpus (English, Dutch, German, French, Italian, Spanish), including a small speech corpus, partially parallel, portions marked up and validated for part of speech and alignment." As for markup: "The TEI Guidelines provide the basis for markup at levels 0 (the TEI header), 1 and 2 as well as many elements of level 3. In collaboration with Eagles, MULTEXT is extending the TEI scheme in order to specify a TEI -conformant Corpus Encoding Style (CES) that is optimally suited to NLP research and can therefore serve as a widely accepted TEI-based style for European corpus work. Application of the CES to CEE languages, which may require minor modifications to accomodate CEE language-specific information and structures, will provide a test of both the TEI Guidelines and MULTEXT and Eagles' extensions to it."
The paper is available on the Internet ftp://ftp.aist-nara.ac.jp/pub/nlp/conferences/SNLR/papers/14.ps.gz in conjunction with the online proceedings; see also the mirror copy in Postscript format and in PDF format. MULTEXT work sponsored by the Commission of European Communities Linguistic Research and Engineering Project 62-050. For more on the project, see the main entry. For other SNLR conference papers, see the online TOC: http://cactus.aist-nara.ac.jp/lab/events/SNLR/snlr.html.
"The Text Encoding Initiative (TEI) Guidelines for Electronic Text Encoding and Interchange are the result of over six years' work by dozens of scholars from all over the world. As such, they represent a pioneer effort in an area where only occasional and isolated attempts were made before. They will certainly serve as the primary basis for encoding texts in electronic form for the foreseeable future. The work of participants in the TEI not only involved consideration of problems of text encoding that are likely to be with us for decades to come, but also required the development of a methodology - from scratch - for approaching these problems. These pioneering efforts, while likely to be refined and extended, must not be lost: they provide the intellectual basis upon which text encoding practices will build in the future. This collection is therefore documents the course of these efforts. `The TEI Guidelines are extraordinary. Even if they were never adopted they would stand as a significant contribution to scholarship for their detailed analysis of the information sets of a huge range of complex text types.' (From the Preface by Charles F. Goldfarb, inventor of the Standard Generalized Markup Language)."
The contents of this volume are also published as a special triple-issue of Computers and the Humanities (CHUM volume 29, numbers 1-3, 1995). The volume bibliography on SGML/TEI (pages 233-242), however, is included only in this book version. Articles in the first CHUM issue: Charles F. Goldfarb, Preface; Nancy Ide and Michael Sperberg-McQueen, The Text Encoding Initiative: Its History, Goals, and Future Development; C. M. Sperberg-McQueen and Lou Burnard, The Design of the TEI Encoding Scheme; Lou Burnard, What is SGML and How Does It Help; Harry Gaylord, Character Representation; Richard Giordano, The TEI Header and the Documentation of Electronic Texts; Dominic Dunlop, Practical Considerations in the Use of TEI Headers in Large Corpora. Articles in the second issue: David Chisholm and David Robey, Encoding Verse Texts; John Lavagnino and Elli Mylonas, The Show Must Go On: Problems of Tagging Performance Texts; Robin Cover and Peter Robinson, Encoding Textual Criticism; Daniel Greenstein and Lou Burnard, Speaking With One Voice: Encoding Standards and the Prospects for an Integrated Approach to Computing in History; Stig Johansson, The Encoding of Spoken Texts; Alan Melby, E-TIF: An Electronic Terminology Interchange Format; Nancy Ide and Jean Véronis, Encoding Dictionaries. Articles in the third issue: Steven J. DeRose and David Durand, The TEI Hypertext Guidelines; David Barnard, Lou Burnard, Jean-Pierre Gaspart, Lynne A. Price, C.M. Sperberg-McQueen, and Giovanni Battista Varile, Hierarchical Encoding of Text: Technical Problems and SGML Solutions; D. Terence Langendoen and Gary Simons, Rationale for the TEI Recommendations for Feature-Structure Markup.
See a volume description for further details, and the order blank from Kluwer.
For other journal special issues and monographs dedicated to the Text Encoding Initiative, see the relevant subentry for TEI.
See a similar article below.
The growing availability of dictionaries in electronic form calls for a model sophisticated enough to represent the richness of entries and enable complex information retrieval. Electronic dictionaries are a special kind of object, intermediary between a text and a database. Textual models are not powerful enough to handle complex information retrieval, and conventional database models are not flexible enough to handle the richness of their information. In this paper, we outline a scheme for representing electronic dictionaries which departs from previously proposed models. In particular, it allows for a full representation of sense nesting and defines an inheritance mechanism which enables the elimination of redundant information. The model provides flexibility which seems able to handle the varying structures of different monolingual dictionaries.
Abstract: "SGML, which is used for document interchange among various environment, is a meta language to describe documents. Before marking up a document, we need to prepare a DTD that defines a document structure.
In general, a DTD applicable to diverse document classes is incompatible with a DTD focusing on the semantic features of documents. If the number of DTDs grows, the costs of developing application programs for the DTDs would also skyrocket.
To apply a DTD focusing on the semantic features to diverse document classes, we developed a system which, from a base generic DTD, derives a different DTD for each document class. Our system also has a function that translates derived DTD instances to base DTD instances. This function frees us from the burden of developing application programs separately for each of the derived DTDs."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
The article summarizes important findings in a recent market research report published by InterConsult, Inc. The 1994 market study on SGML (Standard Generalized Markup Language) "asserts that SGML expenditures now represent 21% of the overall publishing software market, and predicts that the percentage will rise to 30% by 1998, as worldwide revenues for sgml software and services continues to grow more than 30% annually." According to the report, revenues from SGML services for 1993 were "$77 million higher than what was predicted in 1992..." "The study predicts market revenues for the next four years in nine market segments: integration services, conversion services, electronic delivery, parsing, composing, graphics, database and document management, autotagging and conversion software and authoring. Some of these will still be growing well in four years; other segments will peak as SGML becomes more of a mainstream technology..." Contact: InterConsult, 366 Massachusetts Ave., Arlington, MA 02174; Tel: (617) 646-9600, FAX (617) 646-9615.
No personal author is given. The volume was announced as available for free: call 1-800-955-5323; or contact via surface mail: Interleaf. Inc., Prospect Place, 9 Hillside Avenue, Waltham, MA 02154. A copy of the work is also available via HTTP (HTML format): connect with a WWW client to Interleaf, or in case of link failure, use this mirror copy.
Revision and addition of part 2 (alphabetic 3-character codes) is underway: see ISO 639-2 below. For ISO 639:1988, See provisionally (a) the primary data from the 1988 standard as given here from Keld, or (b) a different compilation of the ISO 639:1988 language codes, or (c) the comparable MARC 3-character language codes, from about 1991, and (d) now, the update to USMARC Code List for Languages from November 15, 1996; [mirror copy]..
ISO 639:1988 is a technical revision of ISO 639:1967, prepared by Technical Committee ISO/TC 37. The two-character language codes of ISO 639 are relevant to SGML encoding in two respects. First, the SGML standard (ISO 8879) itself specifies that declaration of 'public text language' should be given using the language code(s) from ISO 639; see ISO 8879-1986(E) page 36, section 10.2.2.3. Second, the WSD (Writing System Declaration) implemented in the Text Encoding Initiative uses the two-character language code of ISO 639 (as amended) as a 'language.code' attribute of the 'nat.language' declaration, specifying the language in which the WSD is written.
ISO 639 contains much other information about the use of language symbols, registration of new symbols, etc. The language codes of ISO 639 are said to be "devised primarily for use in terminology, lexicography and linguistics, but they may be used for any application requiring the expression of languages in coded form." The registration authority for ISO 639 is given as Infoterm, Österreiches Normungsinstitut (ON), Postfach 130, A-1021 Vienna, AUSTRIA.
The two-character language codes of ISO 639 are recognized as being inadequate for use as SGML language attributes when tagging text, viz, for use as global 'lang' attributes attached to any element to identify the language of the text element or a language shift. In principle, there should be nothing wrong with tagging language using SGML elements rather than attributes, if the encoder has principled reasons for not using attributes (e.g., indexing engines which read simple tags but not SGML attributes). But the two-character codes of ISO 639 are neither sufficiently mnemonic nor complete for the world's languages: whereas ISO 639 supplies codes for only about 136 languages, the Ethnologue published by the Summer Institute of Linguistics identifies over 6100 languages (see
Abstract: "This part of ISO 639 provides 3-character alphabetic symbols for the (re)presentation of names of languages. The symbols were devised primarily for libraries, information services, and publishers to use to indicate language in the exchange of information, especially in computerized systems. These symbols have been widely used in the library community, however, they may be used for any application requiring the expression of language in coded form, including use by terminologists and lexicographers. The list is considered to be an open list. This part of ISO 639 also includes guidance on the creation of language symbols and on their use in some of these applications. Languages designed exclusively for machine use, such as computer programming languages, are not included in this code list." There are about 404 language names in the list. See, for comparison: the bibliography entry for the ANSI/NISO standard, or NISO 3-character language codes (Z39.53-1994) [unofficial], [mirror copy]. ISO 639-2 codes are supposed to be based upon (?) the ANSI/NISO set.
ISO CD 639/2:12/16/91 culminates more than three years of intense collaboration between the representatives of ISO TC 37/SC2 (Layout of Vocabularies) and ISO TC46/SC4 (Computer Applications in Information and Documentation). It preserves the principal features of ISO 639-1 (the existing alpha-1 list) while articulating a code that meets the needs of librarians, managers of bibliographic services, and information specialists. The document is out for DIS ballot until April 15, 1992; it is anticipated that executive action will be taken on the DIS following the meeting of ISO TC/46 in London, May 18-22, 1992. Since the list of 3-character language codes is considered to be an open list, the ISO Council has designated a registration authority for 639 part 2. Proposals for allocating new language symbols should be directed to this authority. It is the Library of Congress, c/o Collection Services, Washington, DC 20540. See the list of language codes from a 1992 draft version.
Under: Technical committee / subcommittee: TC 46. Online lists: FTP from the RIPE server: ftp://info.ripe.net/iso3166-countrycodes, [mirror copy], or: Codes for Representation of Names of Countries (ISO 3166-1993 (E), [mirror copy].
See also: ISO/DIS 3166-1 Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes (Revision of ISO 3166:1993); and: ISO/DIS 3166-2 Codes for the representation of names of countries and their subdivisions -- Part 2: Country subdivision code.
ISO/IEC 8632-1[-4] 1992(E). Second edition. Part 1. Functional specification; Part 2. Character encoding; Part 3. Binary encoding; Part 4. Clear text encoding. This standard supersedes the earlier standard: CGM:1986 (ANSI X3.122-1986). For other information on CGM, see the main database entry for Computer Graphics Metafile.
With Amendment A1 (1988), ISO 8879 constitutes the core specification for SGML. A subset of SGML became a US FIPS (Federal Information Processing Standard) in 1988. The British Standards Institution adopted SGML as a national standard (BS 6868) in 1987, and in 1989 SGML was adopted by the CEN/CENELEC Standards Committees as a European standard, #28879. Australia has dual numbered versions of ISO 8879 SGML and ISO 9069 SDIF (AS 3514 - SGML 1987; AS 3649 - 1990 SDIF). The full text of this ISO standard with Amendment A is incorporated into the text Charles Goldfarb's commentary (
This amendment is incorporated into the text of Charles Goldfarb's SGML commentary (
Also available as The British Standard Guide to SGML Document Interchange Format (SDIF), BS 7138 1989 (ISO 9069: 1988; see in "Snippets,"
The "public text" envisioned in this standard as applied to SGML might be DTDs (Document Type Definitions), or declaration subsets of DTDs, public entity sets, etc. Names include an owner name and an object identifier. Equivalent encodings for the names in ASN.1 and SGML may be supplied for interchange purposes. Note: "The intention of the amendment that has resulted in a 2nd edition is to extend 9070 beyond the simple boundaries of SGML only. It is now used by 9541 (and 10036) for the definition of 'structured names'. A New Work Item Proposal is being submitted to change the title and scope of 9070 to show its extended usefulness." (note from Paul Ellison, December 1991) [needs update]
[May 1996]: See also the main entry for ISO 9070 with information on the relevant WWW site.
A major revision of this TR underway (as of May 1990) will result in a new TR with (16) parts: (1) SGML Tutorial (2) Basic Techniques (3) Advanced Techniques (4) Using Short References for Identifying Markup (5) Using non-Latin Alphabets (6) Referencing and Synchronisation (7) Mathematics and Chemistry (8) Tables (9) Using SGML for Computer-to-Computer Interchange (10) Designing Applications for Database Interfacing (11) Application at ISO CS for International Standards and Technical Reports (12) Public Entity Sets for General and Publishing Symbols (13) Public Entity Sets for Mathematics and Science (14) Public Entity Sets for Latin Based Alphabets (15) Public Entity Sets for non-Latin Based Alphabets (16) Public Entity Sets for Ideograms (adapted from Ludo Van Vooren, "SGML Standards Committee Update: Activities of ISO SC 18 WG8,"
See further information on this standard in the Related Standards page. A future version is to include an "ISO chemical character set, ISOchem"; see a note by Martin Bryan (September 1995).
See also: SGML Public Entity Sets, Proposals. [relative to: http://www.ornl.gov/sgml/wg8/9573ent/ENTITIES.HTM]. Sample collections of entities and glyphs (proposed) for potential inclusion into ISO 9573. For: Ugaritic, Old Persian, Glagolitic, Croatian, Buginese, Cherokee, and Gothic Uncials. Developed by Anders Berglund and others.
The document supplies technical guidance for the development of context- sensitive SGML editors. See "Guidelines for Syntax-Directed Editing Systems,"
Voting on the current DIS began 1994-08-10 [and was to end mid-December 1994 or early 1995]. A posting to CTS in early 1995 by James Clark confirmed that negative votes had not been received, and that the vote was therefore expected to pass.
SUMMARY: "This International Standard defines the Document Style Semantics and Specification Language (DSSSL) used to specify formatting and other transformations of SGML-encoded documents. The initial focus of DSSSL is on formatting for both paper and electronic media, and on the conversion of SGML documents encoded according to different DTDs.
This International Standard has been structured to permit future sections to be added to this International Standard to cover the other areas of document processing and data management.
The main objective of the DSSSL Standard is to provide a specification language for expressing formatting and other document processing specifications in a formal and rigorous manner so that these specifications may be processed by a broad range of formatters, either natively or using a translation mechanism.
The DSSSL specification language will include tree transformation specifications and formatting specifications and other semantics to allow users to specify the types of formatting to be applied to various objects during composition and layout and pagination.
For formatting, a DSSSL-driven implementation can create a style sheet language that can be mapped into the DSSSL typographic characteristics and other composition and layout semantics.
In addition to the basic formatting semantics, DSSSL includes a language for writing a general transformation specification that provides the capability to transform documents from one SGML application into another.
DSSSL is designed to allow for specifications that apply to a class of documents. These specifications are applicable to all possible document instances in an SGML application as well as to a particular document instance.
The DSSSL specification language is declarative; it is not intended to be a complete programming language, although it contains constructs normally associated with such languages and provides a well-defined interface to a user-selected programming language, if such a capability is required. DSSSL specifications can be unambiguously parsed and interpreted among heterogenous systems. In addition, DSSSL specifications can be used by existing formatting systems through the use of "front-end" DSSSl processors and translators. DSSSL has no bias toward batch or WYSIWYG formatting systems and does not prescribe any predefined formatting algorithms.
The standardization of formatting semantics is provided in DSSSL through a set of basic structures known as flow objects and the associated set of formatting characteristics that are applied to these objects. DSSSL provides mechanisms for defining and extending the semantic constructs so that a DSSSL application designer can construct a DSSSL application in a manner that best reflects his application environment." [transcription from the Introduction (DIS 1994-08-10)]
For a summary, see: (1)
See now [June 1995] further information in a separate SPDL entry within this database, including pointers to availability of the 1995 draft standard via the Internet (e.g., from the WG8 FTP server and from the SGML Repository).
Description from the 1991 CD version: SMDL "defines a language for the representation of music information, either alone, on in conjunction with text, graphics, or other information needed for publishing or business purposes." Multimedia time sequence information in also supported. SMDL is a HyTime application conforming to ISO/IEC DIS 10744 Hypermedia/Time- based Structuring Language (HyTime), and an SGML application conforming to Standard Generalized Markup Language (ISO 8879:1986). An earlier version was published by ANSI (American National Standards Institute), as ANSI X3V1.8M Journal of Development. ANSI Project X3.542-D. Standard Music Description Language (SMDL). X3V1.8M/SD-8. 60 pages. Sixth Draft. April 15, 1990. See a description of SMDL in an overview article: Steven R. Newcomb, "Standards. Standard Music Description Language Complies with Hypermedia Standard,"
See now [July 1995] further information in a separate SMDL entry within this database, including pointers to availability of the 1995 draft standard (DIS) via the Internet. Or see an overview taken from the DIS.
"HyTime is a standard neutral markup language for representing hypertext, multimedia, hypermedia, and time- and space-based documents in terms of their logical structure. Its purpose is to make hyperdocuments interoperable and maintainable over the long term. HyTime can be used to represent documents containing any combination of digital notations. HyTime is parsable as Standard Generalized Markup Language (ISO 8879:1986). HyTime provides standardized means of expressing (1) intra- and extra-document locations, and arbitrary links between them, (2) the scheduling of multimedia objects in 'finite coordinate spaces,' and (3) rendering instructions for arbitrarily projecting such objects onto other finite coordinate spaces, and other constructs." [taken from an abstract in
For further information on HyTime, see (1) the WWW SGML Page HyTime main entry, (2) the book by Steve DeRose and David Durand, (3) the book by Eliot Kimber, and (4) the
See also Technical Corrigendum 1 to ISO/IEC 10744 [by Charles F. Goldfarb], Draft for ballot: March 27, 1995. The relevant documents are available from the SGML Repository or via this server as three text files: httc1.txt (24K), hi1anarc.txt (46K), and hi1anfsi.txt (22K)
The standard was prepared by Technical Committee ISO/TC 46, Information and documentation, Subcommittee SC4, Computer applications in information and documentation. The title "ISO..." appeared on the print copy distributed in mid-1994 by NISO/EPSIG, despite errors: it was apparently a premature printing. This "ISO" standard supercedes the 1988 (EPSIG/AAP) standard authorized by ANSI/NISO; see the bibliographic reference. The standard included three public DTDs (books, articles, serials) in "final" form and a provisional DTD for mathematics. The ISO 12083 DTDs [though not now in final form (November 1994)] are available on the Exeter SGML Project server and elsewhere; try: Exeter ftp://info.ex.ac.uk/ISO-12083/ or else ftp://actd.saic.com/pub/SGML/ISO-12083/. Although several requests have been made on CTS for release of electronic copies of the DTDs into public space, it remains unclear whether ISO will authorize this form of distribution for the DTDs.
See the EPSIG description "About the Standard"; [mirror copy]
SMSL "Extends HyTime by providing SGML meta-DTD architectural forms for describing the object classes, virtual functions, messages, aggregates and class/data membership used in a multimedia presentation's script. Also contains a definitions for a starter-set of functions used by scripting languages." [from: Index of OII Standards Report.
The SMSL Committee Draft ISO/IEC 13240 is available in Postscript format; [mirror copy, December 22, 1996]. See the main SMSL entry for other details.
Voting was 1993-08-12 thru 1994-02-12. [Entry needs update. Make links to Conformance Testing (Initiative) on main page.]
Abstract: "This presentation explains the goals for BNA's new publishing system, why BNA chose SGML as an integral part of that system, and provides an overview of how BNA implemented the system. Topics covered include undertaking business process re-engineering, adopting SGML, converting legacy data, and lessons learned during the process. BNA (The Bureau of National Affairs, Inc.) and its subsidiaries provide labor, legal, economic, and regulatory information to business, professional, government, and academic users."
"It really all boils down to the data and the fact that the data is the company's most valuable resource (second only to the people who create it). We used the term 'data repository' to refer to BNA's entire collection of documents and other data, including primary source laws, regulations, opinions, internally created news stories, legal headnotes, and reference materials. BNA has acknowledged that we must manage documents as a corporate asset and we must have the ability to search, retrieve, and update documents throughout the publishing life cycle. SGML was chosen as a way to identify and protect the data. BNA started over 50 years ago using typesetting instructions for Linotype operators. In the 70s we used two digit 'locator codes' to identify typesetting instructions. In 1980 we switched to proprietary (Atex coding) to produce our notification and daily publications. In 1985, with the purchase of a Datalogics system to produce our looseleaf publications, we began using unparsed SGML-like coding. Oh, if we could only recover from the blunder of using unparsed data!"
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Sgrep is a Unix tool for searching the contents of text files. Sgrep implements an algebra of unrestricted text fragments called regions. The algebra allows the retrieval of document components, represented as regions, based on conditions on their relative containment and ordering. This simple yet powerful model is suitable for querying structured document formats like electronic mail, RTF, LaTeX, HTML, or SGML documents. We describe the sgrep query language and give examples of its use. Especially, we explain how sgrep can be used for querying and assembling SGML documents."
Available online in Postscript format: ftp://ftp.cs.helsinki.fi/pub/Reports/by_Project/DocMan/Using_sgrep_for_querying_structured_text_files.ps.gz; [mirror copy]. See also the software main entry: 'sgrep' grep-like searching of structured documents.
Abstract: "We present a powerful document transformation language called TranSID, which is targeted at structured (SGML) documents. The language is based on a powerful model where the entire input document tree may be referenced during the transformation process. The evaluation is performed in a bottom-up manner. A language evaluator has been implemented which runs in Unix environments."
Note also the longer work by Greger Lindén: Structured Document Transformations, PhD Thesis, Report A-1997-2, Department of Computer Science, University of Helsinki, June 1997. 122 pages. Available online in Postscript format, via FTP.
The document is available online in Postscript format: via FTP; [local archive copy].
The paper was also published in the Proceedings of The Fifth Symposium on Programming Languages and Software Tools, Jyväskylä, Finland, June 7-8, 1997, ed. Jukka Paakki, pages 72-83, Technical Report C-1997-37, University of Helsinki, Department of Computer Science, June 1997.
"Abstract: Patent and Trademark Office (PTO) Commissioner Bruce A. Lehman reported to a House subcommittee that the development of an electronic filing system is 'critical' to the Office's efforts to reduce patent filing time. The PTO, which recently unveiled the Automated Patent System, hopes to reduce patent processing time to 12 months, down from the high of about 3 years in the mid-1980s. The electronic filing system is necessary to reach this goal while supporting a workload that grows 6 percent annually. The PTO will choose between two off-the-shelf applications based on SGML, one developed by InContext, the other by Microstar Software. The candidates will be tested at small companies starting in August 1996, and Lehman hopes electronic filing to be available within three years."
Abstract: "The Railroad Industry Forum (RIF) is a team of the National Association of Purchasing Managers who were tasked to develop a standard for the exchange of electronic parts catalog data within the North American railroad industry. The RIF members are comprised of major railroads and railroad manufacturers. Mary McCarthy and Betty Harvey, Electronic Commerce Connection, Inc. developed the EPCES DTD. EPCES - Electronic Parts Catalog Exchange Standard, is a standard that was developed by the RIF for interchange and presentation of illustrated parts catalogs. The presentation of EPCES information has been designed to facilitate point and click capability. LinkOne is an electronic parts catalog and service manual delivery system. It has been developed to enable electronic viewing of parts and service information for manufactured equipment and processes. LinkOne provides point and click functionality between graphics and textual information. ISOGEN International Corporation has developed an EPCES filter for LinkOne to support importing and/or exporting parts catalog information from the manufacturers or railroads in SGML compliant to the EPCES standard."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
For more information on RIF, see the dedicated database entry Railroad Industry Forum: Electronic Parts Catalog Exchange Standard (EPCES), or the description provided by Betty Harvey via the Electronic Commerce Connection web server.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Letter to the editor ("Inbox" department), suggesting that some of the problems identified in Byte's article "Work Flow Without Fear" can be addressed through SGML: "SGML can be used to define the interface format of the documents through the work flow."
The article is a tour of East Asia focused upon SGML issues. Central issues include the character repertoire, character encodings, and native-language document markup. The sidebar article by Gavin Nicol ("Postcard from Tokyo on HTML") discusses ERCS and ISO-2022-IPEUC in relation to Asian language support in SGML/HTML.
See the provisional volume description in a separate document. See also provisionally the Amazon.com entry: "Synopsis: SGML experts are in short supply and in high demand. This book will help jump start SGML users by providing 'cookbook recipes' for the most common SGML document type definitions (DTDs). The CD-ROM contains hundreds of sample DTDs that users can cut and paste from to create their own DTD." [amazon.com]
Abstract: "The paper presents experiences based on the study of a pilot project integrating an SGML-based document processing system at the University of Oslo, Norway. The experiences are examined from three perspectives in order to discuss them in relation to different aspects of the system; the use situation, the organizational benefits and challenges, and the technological requirements. Improving the system based on experiences within one perspective may lead to conflicts to consider when improving the system based on experiences found within other perspectives. The paper states and discusses some of the conflicts in SGML-based document systems. The paper concludes with challenges in development and use of SGML-based document systems, and states some issues for further research."
The document is available online in HTML format or PDF; [local archive copy].
"Abstract: An electronic dictionary system (EDS) is developed with object-oriented database techniques based on ObjectStore. The EDS is composed of two parts: the Database Building Program (DBP), and the Database Querying Program (DQP). DBP reads in a dictionary encoded in SGML tags, and builds a database composed of a collection of trees which holds dictionary entries, and several lists which contain items of various lexical categories. With text exchangeability introduced by the SGML, DBP is able to accommodate dictionaries of different languages with different structures, after easy modification of a configuration file. The tree model, the Category Lists, and an optimization procedure enables DQP to quickly accomplish complicated queries, including context requirements, via simple SQL-like syntax and straightforward search methods. Results show that compared with relational database, DQP enjoys much higher speed and flexibility. With EDS this paper demonstrates how to apply OODBMS's to systems that handle text information with strong yet varied intrinsic hierarchies."
Abstract: "There is a great deal of variation in the encoding of spoken texts in electronic form, both with respect to the types of features represented and the way particular features are rendered. This paper surveys problems in the electronic representation of speech and presents the solutions proposed by the Text Encoding Initiative. The special tags needed for the encoding of spoken texts are discussed, including a mechanism for temporal alignment. Further work is needed on phonological aspects, parallel representation, and on the development of software which connects the systematic underlying representation with a workable format for input and display."
The manual describes the TEI/SGML encoding scheme used to mark up text samples used in the parallel text project. Available on the Internet in HTML format: http://www.hd.uib.no/doc.html: ENPC Documentation [mirror copy].
Available on the Internet in Postscript format: ftp://ftp.hd.uib.no/pub/corpora/enpc.poznan.ps [mirror copy].
The document contains a print version of the (TEI/SGML) DTD used in the parallel text corpus, and examples. Available on the Internet in Postscript format: ftp://ftp.hd.uib.no/pub/corpora/enpc.lund.ps [mirror copy]. For further details, see the main entry for the English-Norwegian Parallel Corpus.
Based upon a paper from the Fourteenth International Conference on English Language Research on Computerized Corpora, Zürich, May 19-23, 1993.
Republished in version 2.0 as "Electronic Texts and their Use for Literary Research". See Electronic Texts and Computer Research by Eric Johnson.
The article is available online via Eric Johnson's WWW server: 'Electronic Texts' [mirror copy here, text only].
"Scholars in the humanities today are routinely doing textual and linguistic research that a generation ago would have been impossible or would have required the dedication of a lifetime. Such research is now feasible because humanists use computers and because texts of major writers are available in electronic form.
The Oxford Electronic Text Library edition of The Complete Works of Jane Austen (OETL Austen) is exactly the kind of electronic text that modern scholars need. It is an accurate rendering of R. W. Chapman's Oxford Illustrated Jane Austen, the standard scholarly edition of Austen, and it contains a wealth of useful information encoded in Standard Generalized Markup Language (SGML). The OETL Austen is distributed in both MS-DOS and Macintosh formats, and a site license is available. It will be used in a multitude of ways by students of Austen for years to come." [from the Introduction]
Johnson favorably reviews the OETL Austen, which uses SGML to structure the electronic text. A copy of the document is available online in HTML format. [Pages under construction: try simply "http://www.dsu.edu/~johnsone/" if the previous link fails.] Full information for ordering the electronic text edition is given in the review. See also a summary of the review by Mary Mallery.
Abstract: A logical history of document editing mechanisms is presented. The design space for document style mechanisms is analyzed. Six primary design issues and the subsidiary issues they raise are discussed. Some major style issues that are seen as the subject of future research are identified.
The author writes a positive review, delineating improvements in the second edition.
Abstract: "Abstract: As Internet tools become more sophisticated, many scientists are abandoning conventional methods of communication, such as the journal and the scientific conference, in favour of electronic means. The obvious benefit of Internet-based communication is the ability to share and discuss data, analysis techniques and conclusions without leaving the laboratory. More importantly, however, the Internet is also inspiring the creation of completely new ways of communication that may have a profound effect on how science is done. The paper discusses the Chemical Markup Language (CML) which facilitates the exchange of chemical information on the Internet. The CML project aims to ensure that chemical software and databases are compatible for use with CML, by means of collaboration with their creators." [CML is an experimental application of XML, Extensible Markup Language.
This article examines the problem of document representation in computer systems for printing, editing or interchange among heterogeneous systems. After a discussion of the various possibilities for defining documentation representation formalisms, it considers a number of standard representations typical of their class: page description languages, SGML, Interscript, ODA. Several other articles in the volume are of direct or marginal relevance to SGML as a metalanguage for document-structuring.
Abstract: "This paper starts by tracing the architecture of document preparation systems. Two basic types of representations appear: at the page level or at logical level. The paper then focuses on logical level representation and tries to survey three existing formalisms: SGML, Interscript, and ODA."
"Abstract: Two technologies have come together to make online technical publishing begin to work. The first and foremost of these technologies is the Internet. Without this massive network of computers and communication equipment, putting a digital library icon on a lab workstation and on an office desktop would have been both problem-plagued and expensive. The second of these facilitating technologies is Standard Generalized Markup Language (ISO 8879: SGML). SGML is, by one definition, a meta-language with which one can capture the structure and semantics of a class of documents. It is internationally recognized as a standard for document representation. Although SGML products have been available for years, the past two years have seen a real growth in interest and use of this technology. AIP has adopted ISO 12083 as the basis for its SGML documents. As a standard, ISO 12083 is overseen by an international working group but not owned by any one organization."
Available from UMI: University Microfilms International, Inc., Number 8804059.
Summary: Several improvements are suggested to the syntax of SGML, the recent international standard for the description of electronic document types. These improvements ease processing by existing tools, remove ambiguity cleanly, and increase human usability. They also indicate some guidelines that should be followed in the design and specification of computer-software standards. By following accepted computer-science conventions for the description of languages the design of a standard may be improved, and the subsequent implementation task simplified.
Draft version 18-October-1988, "accepted for publication in
See also the response of Ron Hayter, "Comments on 'On Improving SGML'," Technical Bulletin 4. Software Exoterica Corporation [OminMark], 1988. Ron Hayter argues that Kaelbling's "improvements" to SGML are based upon a misunderstanding of the intent of the standard. Kaelbling's original draft known to Hayter was apparently 16-March-1988; Kaelbling's revised draft of 18-October-1988 responds to Hayter's comments.
Abstract: "Several improvements are suggested to the syntax of SGML, the recent international standard for the description of electronic document types. These improvements ease processing by existing tools, remove ambiguity cleanly, and increase human usability. They also indicate some guidelines that should be followed in the design and specification of computer-software standards. By following accepted computer-science conventions for the description of languages the design of a standard may be improved, and the subsequent implementation task simplified."
Received 16-March-1988, Revised 18-May-1990. Another version of the paper is found in OSU-CIRSC-7/88-TR22. Author affilation: Siemens AG, ZFE IS EA 11; Corporate Applied Computer Sciences; Otto-Hang-Ring 6; 8000 Munich 83, FRG.
Abstract: "The advantage of structured markup in SGML (Standard Generalized Markup Language) has recently become clear. This technology is being used to automatically convert documents into accessible forms for blind people. In Germany one of the first sets of documents available in SGML is the scientific journal article headers from the "Springer Verlag Journal Preview Service". This article gives a description of the "Journal Header Reader" application. We developed this application to make scientific documents in several formats accessible to blind people. The following chapter gives an overview of the SGML facilities used in our project." [from the document introduction]
Available on the Internet in HTML format: A Journal Header Reader program for the blind, [mirror copy, November 1995].
Abstract: "Structured reporting systems allow health-care workers to record observations using predetermined data elements and formats. The author developed the Data-entry and Reporting Markup Language (DRML) to provide a generalized representational language for describing concepts to be included in structured reporting applications. DRML is based on the Standard Generalized Markup Language (SGML), an internationally accepted standard for document interchange. The use of DRML is demonstrated with the SPIDER system, which uses public-domain internet technology for structured data entry and reporting. SPIDER uses DRML documents to create structured data-entry forms, outline-format textual reports, and datasets for analysis of aggregate results. Applications of DRML include its use in radiology results reporting and a health status questionnaire. DRML allows system designers to create a wide variety of clinical reporting applications and survey instruments, and helps overcome some of the limitations seen in earlier structured reporting systems."
See the main database entry for SPIDER - Structured Platform-Independent Data Entry and Reporting, or the web site for SPIDER. An online document (Postscript) is available which describes DRML: http://www.mcw.edu/midas/papers/AMIA96-DRML.ps; local archive copy.
Abstract: "Structured reporting systems allow health care providers to record observations using predetermined data elements and formats. We present a generalized language, based on the Standard Generalized Markup Language (SGML), for platform-independent structured reporting. DRML (Data-entry and Report Markup Language) specifies hierarchically organized concepts to be included in data-entry forms and reports. DRML documents serve as the knowledge base for SPIDER, a reporting system that uses the World Wide Web as its data-entry medium. SPIDER generates platform-independent documents that incorporate familiar data-entry objects such as text windows, checkboxes, and radio buttons. From the data entered on these forms, SPIDER uses its knowledge base to generate outline-format textual reports, and creates datasets for analysis of aggregate results. DRML allows knowledge engineers to design a wide variety of clinical reports and survey instruments."
See the main database entry for SPIDER - Structured Platform-Independent Data Entry and Reporting, or the web site for SPIDER - Structured Platform-Independent Data Entry and Reporting An online version of the document in Postscript format: http://www.mcw.edu/midas/papers/AMIA96-DRML.ps; local archive copy.
Abstract: "Structured reporting systems allow physicians to record findings by using predefined vocabularies and data-entry formats. The data-entry and reporting markup language (DRML) is used to define structured reporting applications for the SPIDER (structured platform-independent data entry and reporting) system. World Wide Web technology can be used to implement systems for structured entry and retrieval of medical data. The SPIDER system and its DRML report-definition language provide simple, platform-independent tools for structured reporting that conform to internationally recognized standards. The article guides readers through the use of DRML and SPIDER, and allows readers to interactively create structured reporting applications."
"DRML is a generalized report-specification language that simplifies the creation and maintenance of structured reporting applications. The specification of DRML as an SGML document type definition provides standardization that allows DRML documents to be used and exchanged across various computing platforms. Systems for publishing and on-screen editing of SGML documents are available commercially [. . .] . Such programs allow interactive, on-screen editing of DRML documents. Software is also available for validating the syntax of SGML documents [...] By including the DRML document type definition within a document (either explicitly or by reference), such software can be used to check the syntax of a DRML report definition. World Wide Web technology can be used to implement systems for structured entry and retrieval of medical data. The SPIDER system and its DRML report-definition language provide simple, platform-independent tools for structured reporting that conform to internationally recognized standards. This article has demonstrated their use for interactively creating structured reporting applications." [from the conclusion]
The document is available online in HTML format; see also target URL, registration may be requested].
[Received January 22, 1997; revision requested February 26; revision received and accepted March 3; posted March 10. Supported in part by The Whitaker Foundation (Biomedical Engineering Research Grant to C.E.K.) and the National Library of Medicine (USPHS grant G08 LM05705). Presented in part as infoRAD exhibit 9111WKS at the 82nd Scientific Assembly and Annual Meeting of the Radiological Society of North America, Chicago, December 1.
Abstract: "The content-reuse system of The Wall Street Journal Interactive Edition makes extensive use of SGML and XML to reorganize and reformat the content presented in the main wsj.com website. This paper discusses how the structures that define an Interactive Journal edition and its component articles are queried, processed, and converted by automatically triggered content-processors, allowing us to quickly fill requests by potential publishing partners to feature our branded content in their contexts."
[Conclusion:] '. . . All of our content-reuse processes owe their flexibility and ease of implementation to our use of SGML and XML. Articles created in SGML have been translated and served out in all sorts of flavors of HTML and other plain text formats. Edition structures and configuration files specified in XML are processed and tailored by custom software that allows our editors to specify what constitutes a mini-edition. And when our automatically generated content falls short of serving their audiences completely, an editor can step in and finish the job. . . . Our editors and designers are charged with constantly improving how our news can be accessed, navigated through, presented, and used. And our business-development staff is constantly seeking new ways to raise the visibility of our brand, which often means spreading excerpts from our trove of content out to places and platforms that our primary web site would not otherwise reach. Having our news, and the processes that direct where that news belongs, in an extensible format has proved to be the key to fulfilling their requirements.'
The document is available online in PDF format - "News you can reuse." [local archive copy] For other articles in this issue of MLTP, see the annotated Table of Contents.
Revision: Received 7 July 1998, Revised 12 August 1998.
Abstract: "Using SGML within our Web publishing system not only allows us to create better-looking and more complicated HTML than editors could otherwise have authored using a native formatting language, but it also allows our editors and designers to massage the look of the edition as often as desired, and to produce spin-off products without additional editorial effort. To be presented will be an architectural overview describing how our publishing system offers editors a tremendous menu of publish-time choices."
"At The Wall Street Journal Interactive Edition, we have been using SGML to mark up news articles since our launch in April, 1996. The elements and attributes we use in our authoring system attempt to answer the question 'What is this content, and what makes it different?' as opposed to 'How do we want this to look in a Web browser?' Even though we may want a byline to wind up looking bold, we mark it up with a <byline> tag, not a <b> tag. Only later in the publishing process do we translate our documents into HTML and its variants. This paper will outline the benefits of this approach, and then describe in some detail how we create our SGML, and how we format it."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Federal Aviation Administration guidelines prescribe compliance with SGML for specified information deliverables, and USAir Group Inc.'s maintenance division selects software products that conform to the SGML standard. The article describes how the workflow accounts for non-SGML data in the USAir information system as well.
Abstract: "By Fall 1986 the Oxford English Dictionary will have been completely entered into machine-readable form as a first step toward creating an integrated version of the Dictionary and its Supplement. The ability to update and revise the OED requires the addition of a considerable amount of structure to the keyboarded text. Various software approaches to transducing the text of the OED in order to add this structure were evaluated, and eventually INR and lsim were chosen. The ise of INR, a program for computing finite automata, necessitated that the structure of the OED be described as a regular language. The methods used to describe the OED, resolve ambiguities and deal with space limitations are detailed. These methods are not limited to the OED, but may be applied to any text in which one wishes to augment the structural information."
The document was also submitted as a master's thesis (Master of Mathematics in Computer Science) to the University of Waterloo. See further on researches related to the production of NOED2 in the main entry for NOED.
Abstract: "An object-based methodology for knowledge representation and its Standard Generalized Markup Language (SGML) implementation is presented. The methodology includes class, perspective, domain and event constructs for representing knowledge within an object paradigm. The perspective construct allows for the representation of knowledge from multiple and varying viewpoints. The event construct allows actual use of knowledge to be represented. The SGML implementation of the methodology facilitates usability, structured, yet flexible knowledge design, and sharing and re-use of knowledge class libraries."
The article is available online in Postscript format; [local archive copy]
Kennedy summarizes the main features of Alschuler's book, and highlights its unique contributions among the available published books on SGML. For additional information, see the bibliographic entry for Alschuler's ABCD...SGML.
The author provides a survey of DTD development approaches, describing the advantages and disadvantages of each in various contexts.
The author analyzes the tendency to let production demands override wise long-term maintainability of SGML implementations and SGML-encoded data. The evidences are typically made manifest in trying to make the data fit the DTD and related constraints, rather than improving the DTD and application design.
Summary: "Near & Far Designer 3.0 was specially designed to help make the transition from SGML to XML as smooth and straightforward as possible. Near & Far Designer 3.0 can evaluate any valid SGML DTD and interactively convert all mappings that are one-for-one. It will also highlight any remaining discrepancies, evaluate end user resolutions, and complete the transformation from an SGML DTD to an XML DTD - taking all guess work out of this task. Near & Far Designer 3.0 was designed to enable organizations to make the transition from SGML to XML in a cost and resource effective manner. Following the transition from SGML to XML, the graphical interface of Near & Far Designer 3.0 makes the ongoing creation of XML DTDs an easy task in the future. Designer now offers the document analyst a choice to create either new SGML DTDs or to create XML DTDs directly."
Available online: "Converting SAE J2008 to an XML DTD Using Near & Far Designer 3.0."
The article describes the decisions leading up to the closure of Datalogics, scheduled for Spring 1996. A list of notable past and recent employees of Datalogics is printed in the article. A new company, Datalogics Inc., will assume responsibility for supporting the core products. The new Datalogics will be partially owned by Adobe, together with Steve Brown and Jim McNeill (CEO). A new users' group (DLSIG) is being formed to work with the new company.
A report on the EPSIG meeting of May 12, 1997, and on the April 1-2 1997 meeting in New York. Most of the current work on ISO 12083 related to maths. Among other recent decisions: "It was determined that an ad hoc mathematics group be formed and meet to make recommendations before the formal ISO 12083 meeting in December [1997] in Washington DC. Dianne Kennedy will coordinate that work. DLI list was provided to begin work via email: dli-math@ncsa.uiuc.edu. A second Ad Hoc Committee should be formed to review 12083 and make recommendations. This committee will be responsible for collecting publishers' requirements and documenting how publishers are using ISO 12083 today and how people are currently changing the standard models." [Extracted; see the complete text of the article in the ISUG Newsletter.]
Part 1 of a multipart series of articles on DSSSL.
"One of the newest ISO standards positioned to impact the publishing world is ISO/IEC 10179. In 1988, ISO/IEC JTC1 SC18/WG8, the working group which developed SGML, HyTime, and some of the other SGML-related ISO standards, began writing this new standard. The working group had representatives from the United States, France, Japan, Germany, Ireland, Norway, the United Kingdom, and other countries as well. The new standard, also known as Document Style Semantics and Specification Language (DSSSL), became international standard in April 1996. So, what exactly is DSSSL? How does it fit with SGML and HyTime? And why do we need DSSSL anyway?" [from the Introduction]
See a related version of the article online; [mirror copy]. For more information on DSSSL (Document Style Semantics and Specification Language), see the main entry in the SGML/XML Web Page, and the dedicated section on DSSSL Software Tools.
Part 2 of a multipart tutorial article on DSSSL - Document Style Semantics and Specification Language, ISO 10179. See the first of the serialized articles in the February 1997 issue of <TAG>.
The article provides an update on SAE J2008, a family of standards pertaining to the automotive and truck industry, particularly for emission-related (clear air) information. The April 11, 1997 meeting actions and issues are highlighted in the report."
"SAE J2008 is a family of standards developed by the membership of the Society of Automotive Engineers in response to the mandate of the Clean Air Act to partition and provide easy access to emission-related automotive service information. At the heart of this SGML standard is a relational Data Model for Automotive Service Information rather than any particular document model. The SGML definition set forth within J2008 provides a hierarchical representation of the Data Model. In addition, this standard provides models for common text constructs such as tables, paragraph, lists, and procedures which are found within automotive service information." [Extracted; see the complete text of the article in the ISUG Newsletter.]
Update on the Draft SAE J2008 Standard. The California Air Regulatory Board (CARB) is proposing that automobiles made (sold?) in California conform to SAE J2008 for 2002 vehicles; the trucking industry is making progress on a new standard T2008, based upon J2008. For additional information on SAE J2008 and T2008, see the main entry for automotive and truck industry use of SGML.
The author summarizes the April 1997 tutorial for journal publishers sponsored by GCA, taught by her and Murray Maloney. The focus was upon the emerging XML standard. Kennedy reports a growing interest in SGML/XML among journal publishers, including those who are using ISO 12083 as a basis for enterprise DTDs. The ariticle also addresses W3C math in XML (Extensible Markup Language) documents.
The author discusses the variety of services now being delivered by SGML consultants, using her own experiences and those of other consultants as examples (The Sagebrush Group, Mulberry Technologies, L. A. Burman Associates, and Information Architects).
Summary of the major annual Seybold conference from the perspective of SGML interests. Keynote speeches were delivered by Marc Andreessen of Netscape and Brad Chase of Microsoft. The latter talked about the integrated desktop of Explorer 4.0, "which heralds the importance of integration of the Web with traditional desktop products, whether mainstream or SGML-based."
New products: (1) XyVision SGML Conductor, a compuond document management solution integrating PDM and FrameMaker+SGML; (2) Folio 4.0, which is strongly aligned with Microsoft, and sports features aimed at protecting copyrighted information that is delivered electronically ["rights management functionality"]; (3) Corel Ventura 7.0 - a publishing package completely re-written to support 32-bit processing, and having SGML support in the form of the DTD Designer, SGML Layout, and SGML Editor tools; (4) Near and Far Author 2.0, which includes integration with Microsoft Word 7.
According to the author, the conference evidenced the emergent concept of "Mainstream SGML" - somewhat in opposition to "Industrial Strength SGML." The latter is "a strategy to bring SGML to mainstream business applications at office-software price levels." Microstar and several partner companies have devloped a logo to symbolize the development and marketing focus. Other industry partners are skeptical of the merits of this programme, or are doubtful of the net effects: "will the result be simply hierarchical HTML with certain content extensions?"
The article is a report and evaluation of the annual conference of the Society for Scholarly Publishing, held in Minneapolis on May 30-31, 1996. The author describes three major DTDs used in journal publishing and some of the challenged presented by cultural and economic issues. In most environments, it is found to be necessary to modify "industry standard" DTDs to meet the requirements of the stakeholders. The ISO 12083 DTD and its predecessor (AAP) are in use, as well as a DTD developed by Elsevier Science Journals. InContext and Folio are implementing a turnkey journal production system (SGML Journal Publisher) using ISO 12083 as the basis for DTD design.
an invited review of the book The SGML Implementation Guide: A Blueprint for SGML MigrationT, by Brian Travis and Dale Waldt. An HTML version of the review is available on the SGML Resource Center WWW Page [mirror, partial links].
`Tales from the Front' is a new column in <TAG> beginning with issue 8/11. In the current article, Kennedy describes situations in which either of the two database technologies would be perferable, and suggests that OODBs now have a niche place in the database market, especially within the context of the SGML market.
Review of a book on document managing solutions, written by an advisory member of ISO Commmittee responsible for SGML. Though SGML is not a central topic in the book, the author discusses SGML as playing an important role in document engineering.
The article addresses fundamentals of document structure, and the role it plays in information management using SGML.
Abstract: "Implementing SGML can be an enormous task. To be successful, an implementor must have a good technical background in SGML and must have a clear understanding of data flow and SGML system functionality. Gaining a understanding of the key components of an SGML system is critical. This afternoon's presentations are designed to provide the SGML newcomer with an overview of the major classes of SGML tools and a brief review of the products commercially available today. Presenters for this session are independent SGML consultants who specialize in the design and implementation of SGML-based information systems."
Note: The above presentation was part of the "SGML Newcomer" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Summary: "In XML terms, XML-Data is an XML tag set which enables us to precisely describe text structures, relational schema and much more. At the core of XML-Data is a DTD for DTDs. To that, elements have been added to describe schema, either relational or object-oriented. The idea is that with XML-Data we can describe any schema. Then, when XML-coded data is delivered via the Web along with an XML-Data Schema, the receiving system will be able to understand what it is getting. It will not only understand the hierarchy of data, but can also understand other relationships. If the data is relational, a client can understand which data elements are keys and which are foreign keys. Or in an object world, the client will clearly understand which elements are in the same 'class', something our standard XML-coded, well-formed data or even XML DTDs do not communicate today."
Online: XML-Data: A Schema Language for Structured Data. See also the main database entry for XML-Data.
Abstract: "Implementing SGML can be a daunting task. To be successful, an implementor must have a good technical background in SGML and must have a clear understanding of data flow and SGML system functionality. Gaining a understanding of the key components of an SGML system is critical. This afternoon's presentations are designed to provide the SGML newcomer with an overview of the major classes of SGML tools and a brief review of the products commercially available today."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "This presentation uses a carnival Funhouse as a metaphor for implementing SGML for the first time. The speaker will describe three main areas in the funhouse and the hazards presented in each and some tips for surviving the experience: document analysis, DTD writing, and data markup, including legacy conversion and training users to mark up data."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Information access for people with disabilities is creating numerous opportunities and challenges within the SGML (Standard Generalized Markup Language) community. Additionally, as a result of the increasing paradigm shift by the publishing industry toward Internet and WWW-based document delivery systems, the importance of producing accessible information using SGML mechanisms has increased immeasurably.
The primary focus of this paper involves the production of electronic documents. However, the key principals involved in the design, production, and delivery of information apply regardless of the document medium.
In this showcase the presenters will: identify major problems in information and software design that deny access, demonstrate successful products that can be used by people with disabilities to access publications, point to resources that assist developers in creating accessible products in the future. The goals of the showcase are to educate participants about accessible electronic text delivery systems, and direct participants toward resources which help them create of choose accessible products."
Note: The above presentation was part of the "SGML Newcomer" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
This issue of Baskerville makes available a number of papers presented at a joint meeting of the UK TEX Users' Group and BCS Electronic Publishing Specialist Group (January 19, 1995) [mirror copy]. See the link to Baskerville, or email: baskerville@tex.ac.uk. Issue 5/2 of Baskerville has other articles on SGML: "Portable Documents: Why use SGML?" (David Barron); "Formatting SGML Documents" (Jonathan Fine); "HTML & TeX: Making them sweat" (Peter Flynn); "The Inside Story of Life at Wiley with SGML, LaTeX and Acrobat" (Geeti Granger); "SGML and LaTeX" (Horst Szillat). See the special bibliography page for other articles on SGML and (LA)TEX.
Abstract: Electronic publishing is under close scrutiny by publishers, who are faced with increasing pressure to publish faster, reduce costs and increase circulation. Before moving forward, publishers need to determine whether the time is right, and then to decide how to implement an electronic version of their print journals or a totally new electronic-only journal. Decisions must be made on SGML vs. scanned pages, and CD-ROM vs. online. Most importantly, publishers need to determine how their electronic products can offer superior value to scholars and researchers, because the journals will fail if they are perceived to be less valuable than their print counterparts. As telecommunications access speeds increase and online storage costs decrease, the distribution of journals, complete with high-quality photographs, tables and equations, through online systems becomes increasingly viable. The electronic medium can be exploited to add links to relevant bibliographic databases as well as to other relevant journals. Comprehensive information can be made instantly available to users through one easy-to-use interface.
Abstract: "This paper discusses the challenges of capturing the state of distributed systems across time, space, and communities, and looks to XML as an effective solution. First, when recording a data structure for future reuse, XML format storage is self-descriptive enough to extract its schema and verify its validity. Second, when transferring data structures between different machines, XML's link model in conjunction with Web transport protocols reduces the burden of marshaling entire data sets. Third, when sharing collaborative data structures between disparate communities, it is easier to compose new systems and convert data definitions to the degree that XML documents are adopted for the World Wide Web. Just as previous generations of distributed system architectures emphasized relational databases or object-request brokers, the Web generation has good reason to adopt XML as its common archiving tool, because XML's sheer generic power has value in knowledge representation across time, space, and communities."
A version of this document is available online in HTML format: http://www.cs.caltech.edu/~adam/papers/xml/xml-for-archiving.html; [local archive copy].
Abstract: "HTML allows the structural markup of Web documents, distinguishing the elements of a page with tags and declaring the physical relationships among the various document elements. This organizes the display of information and allows humans to read and use it. To give machines this capability, however, requires semantic markup, identifying what each particular element means on its own (for example, 'this is a home street address' or 'this is an e-mail address'). Semantic markup would change what is now simply displayed content to machine-readable, structured content."
"The eXtensible Markup Language (XML) specification makes it dramatically easier to develop and deploy domain- and mission-specific Web pages. In this article, we describe the evolution of the Web's data representation from display formats to structural markup to semantic markup.
"The shift from structural HTML markup to semantic XML markup is a critical phase in the struggle to transform the Web from a universal information space into a knowledge network."
A related version of the article was made available as "X Marks the Spot: eXtensible Markup Language opens the door to a motherlode of automated Web applications", [archive copy, August 4, 1997.]
The published abstract from IEEE (Institute of Electrical and Electronics Engineers, Inc.) is available in HTML format: http://www.computer.org/internet/ic1997/w4078abs.htm; the full text of the article is available from IEEE in PDF format, [local archive copy].
Authors discuss [esp. pages 60-61] the development and use of the 'MLEXd' SGML DTD within the MULTILEX project's efforts to standardize access to lexical data. [Abstract needed]
Abstract: "The SGML and XML standards use a variation of regular expressions called content models for modeling the markup structures of document elements. SGML content models may include so called and groups, which are excluded from XML. An and group, which is a sequence of subexpressions separated by an &-operator, denotes the sequential catenation of its subexpressions in any possible order. If one wants to shift from SGML to XML in document production, one has to translate SGML content models to corresponding XML content models.
"The allowed content models in both SGML and XML are restricted by a requirement of determinism, which means that a parser recognizing document element contents has to be able to decide without lookahead, which content model token to match with the current input token, while processing the document from left to right. It is known that not all SGML content models can be expressed as an equivalent XML content model. It is also known that transforming an SGML content model into an equivalent XML content model may cause an exponential growth in the length of the content model. We discuss methods of eliminating and groups and analyze the circumstances where they can be applied. We derive a tight bound of e n! on the number of symbols in the result of eliminating an and group of n symbols, where e = 2.71828... is the base of natural logarithms. We present the analysis in a pedagogical manner, emphasizing mathematical methods which are typical to the analysis of algorithms. We also show that minimal deterministic automata for recognizing an and group of n distinct element names contain 2n states and n 2n-1 transitions, excluding the failure state and transitions leading to it."
See the online abstract. The full text is available in Postscript format, [local archive copy]
Available in Postscript format via the Internet: ftp://ftp.cs.helsinki.fi/pub/Reports/by_Author/Kilpel%E4inen_Pekka/Tree_Matching_Problems_with_Applications_to_Structured_Text_Databases.ps.gz; [mirror copy].
Abstract: "The Standard Generalized Markup Language (SGML) allows users to define document type definitions (DTDs), which are essentially extended context free grammars in a notation that is similar to extended Backus-Naur form. The right hand side of a production is called a content model and its semantics can be modified by exceptions. We give precise definitions of the semantics of exceptions and prove that they do not increase the expressive power of SGML. For each DTD with exceptions we can construct a structurally equivalent extended context free grammar. On the other hand, exceptions are a powerful shorthand notation-eliminating them may cause exponential growth in the size of a DTD."
Paper presented at PODP '96 Workshop on the Principles of Document Processing, Palo Alto, September 23, 1966. To be published by Springer-Verlag in the Conference Proceeedings. See the note of erick Wood [October, 1996]: "Pekka Kilpelainen, Helen Cameron, and Chris Cleverley and I are currently examining the issues of exceptions and their expressive power, the decidability of structural equivalence of DTDs, and how tag minimization can be defined in a general way." [from "FOUNDATIONS OF MARKUP, HTML, AND SGML"]. Other papers [need bibliog. work] include: (1) P. Kilpelainen and D. Wood, Exceptions in SGML document grammars, (1996), 30 pages. Also appeared as Technical Report HKUST-CS95-??; (2) P. Kilpelainen and D. Wood, SGML and Exceptions, (1996), 13 pages. Also appeared as Technical Report HKUST-CS95-??; (3) H. A. Cameron and D. Wood, Structural equivalence of regular extended context-free grammars and SGML DTDs, in preparation, 1996.
Abstract: "This document is intended to provide a brief tutorial introduction to the HyQ language. It is assumed that you have a working knowledge of SGML and have a copy of the HyTime standard, ISO 10744 [
This tutorial is available in compressed Postscript format from the Exeter SGML Project FTP server as Kimber-on-HyQ-1.1.ps.Z (note binary mode FTP transfer required), or in compressed text (ASCII) format FTP to SGML Project. Alternately, it is available in plain text (ASCII) format from the SGML Repository.
Abstract: "[The presentation] describes several demonstrations of using various tools with the HyTime architecture to do useful and unique tasks. Demonstrations include the creation, management, and presentation of editorial notes, the use of HyTime to create 'virtual' and 'compound' documents. Demonstrates the power of the HyTime architecture both as a set of useful facilities and as a standard that enables interchange and interoperation."
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
"A basic premise of an open hypertext system is that it must be possible to create hyperlinks unilaterally among data objects to which write access is not available. In other words, it must always be possible to create hyperlinks that are independent of the data objects they connect. From this it follows that hyperlinks should always be conceived of and managed as first-class independent objects, at least for the purpose of defining general data models and management schemes. . . The database of hyperlinks must have the characteristics of a traditional database. It must be in a neutral data format that can be accessed with a minimum of cost. It must be general enough to support unanticipated uses. It must provide sufficient expressive power to enable the description of relationships as richly and precisely as authors desire." [extracted]
Available on the Internet in HTML format: http://space.njit.edu:5080/HTFII/Kimber.html [mirror copy, partial links].
Summary: HyTime is an ISO standard (ISO/IEC 10744) that is an extension to SGML. It is intended to support electronic documents which use hyperlinking and multi-media elements. In this book, Kimber focuses on the most practical aspects of the HyTime standard, explaining how to use HyTime to move information from the traditional print-based medium to hypermedia. [publisher's pre-publication description]
The book "Provides an introduction to the HyTime standard, ISO/IEC 10744. Intended primarily for people who have some experience with SGML, especially people doing technical publishing and documentation. A knowledge of SGML syntax is not required but will help make the details and examples easier to understand. The book does include an introduction to SGML syntax and terminology." [author's summary]
Another summary: "This beginner-level conceptual overview of HyTime explains how HyTime is used both in traditional information processing applications and in multi-media/hypermedia applications. It sorts out the basic concepts from the confusing details in the HyTime standard, and shows readers how the standard can be applied relatively simply and easily to existing SGML and hypermedia applications. . .[the book] (1) "discusses the basic problem that HyTime (and by extension SGML) tries to solve, explaining in general terms how HyTime solves that problem, and introducing the necessary syntax; (2) explains the role that SGML-encoded data plays in information management processing; (3) discusses how HyTime addressing methods are used to locate all types of data; (4) considers how to define and use HyTime property sets to access data of any type; (5) explains how to implement HyTime functions using the facilities of existing SGML-based produces and systems; (6) includes samples of HyTime markup, descriptive illustrations, and problems; (7) shows how to incorporate HyTime concepts and architectural forms into an application and includes an application specification for a small HyTime application; (8) [has] an accompanying diskette contains sample code and a public SGML domain parser." [from the PTR server, unstable URL]
See "Dr. Macro's Books for Review" (access to review drafts of books under development - sign up to review the book in advance). Provisionally, see also Eliot Kimber's HyQ tutorial.]
More/recent information on the book is/was available via the Prentice Hall WWW server's search facility: http://www.prenhall.com/, or [mirror copy of the book abstract, from February 1996]]. Possible (unstable) URL for the volume Table Of Contents.
[Note February 03, 1998] See the announcement from Eliot Kimber (ISOGEN International Corporation) for an updated review draft of his forthcoming book Practical Hypermedia: An Introduction to HyTime. The draft incorporates 1) a "new and improved HTML version with useful navigation aids, working cross references, and hyperlinks to the standard itself; 2) an update of the first five chapters to reflect the final text of ISO/IEC 10744:1997, through Hyperlinking; 3) an updated summary of changes for HyTime Second edition (Appendix B in the volume), which you can also find at http://www.hytime.org/papers/hytime-2ed-soc.html." [adapted] HyTime users will recognize the significance of this important reference work, and the value of the online draft version, for which the author now solicits critical review and feedback.
Summary: "External general text entities are not generally re-usable because: (1) IDs and Entity names not guaranteed unique; (2) Fragments cannot be validated in isolation; (3) No SGML-defined structural constraints on general text entities . . .Subdocument entities eleminate the re-use problems inherent in general text entities because they are themselves complete documents."
Available online: "Re-Usable SGML: A Plea for SUBDOC" (W. Eliot Kimber); [SGML version]; [mirror copy]
Abstract: "This paper discusses the issues of SGML re-use and shows why they can only be solved generally through the use of subdocuments. The paper explores the following general issues:
- General text entities are not re-usable
- How to enable interoperation of documents with possibly different document types?
- How to effect the cross-document addressing needed when a single document is composed of many subdocuments?
The SGML standard only defines two object types that can have independent existence: documents and subdocuments. Thus it is clear that only documents and subdocuments can be reliably re-used. In particular, external general text entities are not useful candidates for general re-use. My plea then is for tools to add the functions necessary to support the use of subdocuments for the re-use of semantic fragments. For most applications, such as browsers, this means treating the content of subdocument entities as though it had occurred in a general text entity for the purpose of processing (not parsing). For parsers, it means providing a mechanism to either parse multiple documents in parallel or to suspend the parsing of the parent document while the subdocument is parsed and then integrating the parsing result of the subdocument with the data resulting from the parsing of the parent document. For editors, it means allowing the declaration and editing of subdocument entities. Editors, in particular, may also need to provide ways to define constraints on what document types or architectures are to be allowed for subdocuments in specific application environments (families of DTDs).
I think that these conventions provide a clear and simple way to make the use of subdocuments in general less problematic and more fruitful. The full promise of SGML cannot be realized until the problem of fragment re-use is solved and I am firmly convinced that subdocuments are the key to that solution."
See the online version of the paper: "Re-Usable SGML: Why I Demand SUBDOC", SGML '96 presentation by W. Eliot Kimber of ISOGEN International Corp.; [mirror copy]. An SGML version is also accessible via the ISOGEN server, as well as a package containing HyBrowse styles and instructions for using HyBrowse.
Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"As SGML moves into the main stream as a preferred information representation method, enterprises are faced with the problem of managing their SGML data. . .Attempts to apply information management techniques borrowed from relational databases and program source code management have largely failed. Relational databases are inappropriate because they are intended for data that breaks down well into small, discrete units that organize into tables, which documents largely do not do. Program code management systems fail because documents are generally not record-oriented, complicating the problems of change tracking and management that largely depend on the record-oriented nature of most programing languages. . .SGML holds the potential to solve some of these problems. . ." [from the Introduction]
Available online: "SGML Document Management" (W. Eliot Kimber); [SGML version]; [mirror copy]
Abstract: [for the Closing Keynote address] "With developments like the World Wide Web, intranets, and increased focus on standardization by major software vendors, SGML and its related standards are being revised and enhanced to reflect new technologies and new requirements. This presentation looks at recent events--including the publication of the DSSSL standard, the HyTime Technical Corrigendum, and the XML specification--and projects the trends they represent into the future of SGML. The major trends are more functionality at a lower cost of entry, providing greater overall value."
The printed version of the presentation is available online in SGML format: see W. Eliot Kimber's Closing Keynote Address: "Tastes Great - Less Filling: SGML For the 21st Century."; see also the index page at ISOGEN for the slides and other formats. URL: slides and paper, .ZIP archive; [local archive copy].
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
"Abstract: The paper discusses why the SGML LINK feature is important, the role it plays in the standardization and interchange of document processing information. Also discusses the specific ways in which LINK can be used effectively. Proposes a simple convention for the use of LINK. Discusses the relationship between LINK and other style and transformation specification mechanisms, such as DSSSL."
Available online in HTML format: [mirror copy, partial links].
Abstract: "Defines a method of using the constructs defined by the HyTime standard (ISO/IEC 10744,1992) to both structure scholarly writing by capturing the abstract relationships within it and to affect its presentation in ways that express those relationships through the use of dynamic multimedia presentations. The design assumes that the data to be accessed comes from an essentially unbounded set of networked resources, rather than from a self-contained database. By using HyTime, the design separates the logical structuring and abstract fictional definition of the system from any specifics of implementation, including details of data location and access, with the specific goal of enabling interchange of both structured source data and presentation specifications among disparate systems, or implementations of the same basic system, while also enabling the use of the data by other SGML or HyTime applications for other unanticipated uses."
See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.
Abstract: The author presents an interactive document editor based on an expressive abstract document model for paper and electronic documents. The model introduces the notions of abstract and concrete objects, hierarchical composition of ordered and unordered objects, sharing of components, and reference links. It has been used to specify a wide variety of document objects, and is the basis for a document processing system that allows its users to edit the logical structure of a document using specific structure editing commands. This system introduces two new ideas. The first involves computational objects; each object can be programmed to generate its own unique view of the document, and each of these views can be displayed in a separate window on the screen. The second involves multiple windows to display the document structure. The windows are arranged hierarchically as sets and sequences, depending on the composite structure of the document. This system is used for both editing and viewing documents.
Abstract: "This paper describes how an executable interval temporal logic may be used as a formalism for specifying and manipulating temporal constraints among objects within multimedia documents. The paper presents a taxonomy of such constraints, based in part upon the functionality of existing systems such as the HyTime standard, Firefly and MHEG. It then shows, largely by a series of examples, how each of the elements of this taxonomy can be accommodated in this formalism. It also suggests how this formalism could assist the author in modelling and testing such sets of temporal constraints, and hence serve as an aid in prototyping such documents."
Abstract: "This paper discusses the need for models for multimedia documents and describes a particular formal model. The model makes use of an executable Interval Temporal Logic as its basis. The paper describes how temporal constraints among media items may be specified for subsequent manipulation and for use in prototyping. In particular, it uses the powerful notion of interval projection, both as a device for specifying variable display rates for media items and also for providing a scripting mechanism. The paper also outlines how this model may be used as the basis of an authoring tool for such documents."
The article focuses upon document modelling and authoring systems designed to take advantage of formal models. The author's interest extends beyond the notion of attribute grammar, which serves as the core formalism in the authoring of structured documents in SGML and HyTime; he seeks to develop a model which formally addresses manipulation, including temporal aspects of authoring.
For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
Abstract: "Thompson Legal Publishing has re-engineered aging SGML-based systems to meet current needs. Tools were chosen from solid companies that did not expose the SGML to users, did not restrict the use of SGML in any way, that have the capacity to emulate structure and that have API's. Users now work in an environment that does not force them to place thirty elements/attributes in the data to enter one judicial case citation. Instead, a couple of clicks of the mouse, and in goes the case cite. Our savings in output processing have been enormous; a process that used to take cost $18.00/page now and costs $0.95 per page. The system's simplicity from the user's point of view will be demonstrated, and the complexity of the data created and the resulting flexible output will be shown.
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "As the Web grows wildly, so do archives of electronic documents. Unfortunately, while novice computer users have a right to electronic information, they are often ill-equipped to master the intricacies of boolean search, SQL, and natural language interfaces. Furthermore, novice users are not versed in how to navigate Web topologies effectively, like determining possible work-arounds for '404: Web page not found' errors. With spatial-oriented user interfaces built directly from SGML / HyTime document databases, we can utilize users' a priori knowledge of information spaces. Users everywhere can enjoy navigating the epitome of well-tended hypermedia databases: digital libraries."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
See the "SGML '97 Talk Slides" provided by the author, and other information on the Digital Library 3D Interface Project.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "The Virginia Tech Graduate School requires a specific form for the submission of Electronic Theses and Dissertations (ETDs) to maintain the consistency of these complex documents. The formal statement of these guidelines serves graduate students submitting ETDs, the faculty with whom they work, and scholars who study the submitted ETDs. We defined a Document Type Definition (DTD) in the Standard Generalized Markup Language (SGML) for the representation of ETDs, a logical choice for encoding complex electronic documents. To build the DTD, we analyzed constructs in existing theses and dissertations and studied the rules for their submission. Here we present definitions, annotations, and rationale for each document construct, and we explain the connection of the document constructs into an integrated DTD."
Available online: Document Type Definition for Electronic Theses and Dissertations, by Neill A. Kipp; [mirror copy]
The author speculates on industry trends and the future role of SGML in the coming years.
The author builds upon an earlier essay in <TAG> in which he showed how one might conceive of an HTML document as "an n-ary hyperlink." Here, he explains how the "everything is a link" paradigm can facilitate the design of interactive information systems. The NDLTD (Networked Digital Library of Theses and Dissertations) project uses the notion of "form-as-a-link."
The author overviews DSSSL's main features and then suggests how these features assist information providers with the goal of delivering SGML-encoded data into international markets.
For more information on DSSSL (Document Style Semantics and Specification Language), see the main entry in the SGML/XML Web Page.
The author discusses the notion of "link" in the context of HTML and XML documents, and in terms of HyTime link concepts.
Kipp provides a detailed conference report for the MetaStructures 1998 Conference (August 17 - 19, 1998) held at Le Centre Sheraton Hotel, Montréal, Québec, Canada. The conference was hosted by GCA, and chaired by Steve Newcomb (TechnoTeacher) and Carla Corkern (ISOGEN International Corp). Papers from some of the (more than twenty) presentations will be made available online from http://www.hytime.org/. "In summary," Kipp writes, "MetaStructures '98 had more attendees discussing deeper issues than last year. It had more live design and product demos than any year before. Even so, the ideas of interoperable metastructures are still on the 'bleeding edge' of technology. Therefore, if you feel that the SGML/XML trade shows are secret plots to numb your mind, then next year's MetaStructures will provide the intellectual stimulation you absolutely need."
The author gives a detailed summary of the presentations made at the second HyTime conference, August 16-17, 1995, in Vancouver, British Columbia.
The author supplies a thorough review of the SGML '96 Conference. In product news there is (a) OmniMark V3, enabling Internet transaction servers; (b) AIS Balise Double-Byte Edition, with native support for Unicode, JIS or other large character sets; (c) Stilo Structured Document Editor, with XML support; (d) ArborText's ADEPT Release 7, which will bring the Windows version into sync with the UNIX version 6 platform, add more Asian support, and offer Visual ACL. Other highlights covered in Kipp's report: XML, SGML revision ("SGML '97"), DSSSL, "semantic" DTDs for mathematics, HyTime tools.
Abstract: "SGML is the logical choice for encoding electronic documents, and Virginia Tech encourages (and will later require) students to submit Electronic Theses and Dissertations (ETDs) in SGML. Our DTD must work with translators as well as be usable for students preparing SGML directly. A usability test for tagging ETDs according to our DTD involved teaching SGML-novice graduate students to code using our DTD, observing them tagging their own documents, and having them narrate their thoughts during the process. Our results show that subjects require high-quality system documentation (replete with examples of correct usage), that learning to author the simplest hypermedia in SGML is inherently nonintuitive, and that our line-edited, batch-processed ETD formatting system is easy to use.
This work was funded in part by the Southeastern Universities Research Association (SURA) 1996 project, 'Development and Beta Testing of the Monticello Electronic Library Thesis and Dissertation Program'."
More detailed information on the Electronic Theses and Dissertations project may be found in the SGML/XML Web Page; or see http://etd.vt.edu/etd/. See especially the brief project description [mirror copy], and a related write-up in the September 1996 issue of D-Lib Magazine [mirror copy, December 1996]
Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
The author provides an in-depth review of the 1997 International Conference on the Application of HyTime (IHC '97, August 19 - 20, 1997, Quebec). The article contains a sidebar on HL7 (Health Level 7) and the proposed Kona architecture, which used HyTime constructs.
Note that the author's own paper for IHC '97 "HyTime Engine Peer-Peer Protocol. HEP Cats Jam Java in the Digital Library" is available as part of the online conference record/proceedings; [archive copy].
The author discusses the relationship between XLL [Extensible Linking Language] to other standards, reviewing the major linking facilities proposed in the draft specification. For the context of the presentation (linking within digital libraries), see also the "SGML '97 Talk Slides" provided by the author, and other information on the Digital Library 3D Interface Project.
[Extract:] "The rationales for Virginia Tech's ETD project, include: (1) Preparing graduate students for their professional careers by training them in the use of digital libraries and introducing them to electronic publishing; (2) Promoting collaboration between graduate research programs at separate universities by making graduate scholarship visible and accessible via a network archive; and, (3) More efficient use of the university's library and administrative resources. The channels Virginia Tech has established to guide the finished thesis or dissertation from the student's personal computer to the offices of the graduate school and to the library's on-line archive will be reviewed, stressing that an important component of the Virginia Tech project has been to develop potential models (as opposed to absolute standards) for ETD production elsewhere. Also to be discussed are document formats for the completed ETD (PDF and SGML), multimedia applications, and the archiving of the ETD with UMI. Finally, a statistical analysis of existing Virginia Tech ETDs will be presented. . ."
Abstract available online in HTML format: "Electronic Theses and Dissertations in the Humanities", by Matthew G. Kirschenbaum, Ed Fox; [archive copy]. See the earlier presentation: "Electronic Publishing and Doctoral Dissertations in the Humanities", by Matthew G. Kirschenbaum, 1996 Convention of the Modern Language Association, Washington DC. [archive copy]. See more information on the Electronic Thesis and Dissertation Project and related efforts in the dedicated database entry, and the August-September 1997 discussion, or Electronic Theses and Dissertations in the Humanities: Directory of Resources, by Matthew G.Kirschenbaum.
Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server.
The author describes the components of EIDOS -- "Electronic Information Delivery Online System." EIDOS is part of OCLC's endeavor to deliver resources to libraries. Some 6000 libraries can access the EIDOS database, which was set up using SGML, following the AAP DTD. The article also overviews the Twayne project in which 150 volumes are being encoded in SGML (AAP) for the G. K. Hall Twayne Series.
The article is based upon a paper presented at Markup '88, Ottawa, 24-26 May 1988.
Abstract: This paper introduces the key principles and main features of the VODAK Model Language VML. This includes the standard concepts in object-oriented data modelling needed in the subsequent discussion, like objects, classes, types, inheritance and methods. Then the application of VML for modelling hypermedia documents is discussed. Advanced features in order to tailor data models towards particular application scenarios, that will be needed in order to provide adequate models for hypermedia documents, are introduced. Among these are metaclasses, parametrized object types, semantic relationships and dynamic method delegation. Modelling primitives are designed to model typed hierarchical and hypertext document structures. Particular attention is paid to provide operations that maintain consistency by observing the compositional constraints of document types.
Available in Postscript format as P-93-15.ps.Z from the GMD-IPSI FTP server.
Abstract: "Successful applications of digital libraries require structured access to sources of information. This paper presents an approach to extract the logical structure of text documents. The extracted structure is explicated by means of SGML (Standard Generalized Markup Language). Consequently, the extraction is achieved on the basis of grammars that extend SGML with recognition rules. From these grammars parsing automata are generated. These automata are used to partition a flat text document into its elements, to discard formatting information, and to insert SGML markups. Complex document structures and fallback rules needed for error tolerant parsing strategy has been developed that ranks and prunes ambiguous parsing paths."
The document is available online in Postscript format: ftp://ftp.darmstadt.gmd.de/pub/dimsys/reports/P-97-05.ps.Z; [local archive copy]. A version of this paper was published in International Journal on Digital Libraries Volume 1, Number 4 (1997).
Drafted September 17, 1986, Revised March 13, 1987. See other documents from the INFOODS project (by Klensin and Romberg) which explain this early example of database use of SGML.
Author's annotations from a personal note: `This document provides an example (although we slightly disguised it to make it more accessible to nutritionists) of the application of fairly deep semantics to SGML. It ... all fits a simplified model of the syntax. There are two meta-rules on the grammar ... (i) With one exception, in the outermost tag, we solved the ``attributes vs nested tags" problem with a firm "no attributes" rule. (ii) Beyond a specific level (defined semantically and determinable lexically by nesting depth), there is a firm rule that any GI that requires an end-tag is spelled with a trailing slash and any GI that does not permit an end-tag is spelled without one. As I think I mentioned earlier, we don't allow minimization or any else that is permitted by not required at the lexical level. There are elements that need not appear at all (almost all of them), but that is another issue' John C. Klensin is Director, INFOODS (International Network of Food Data Systems) Secretariat, and Chairman of the Standards Committee for ACM. Address: Massachusetts Institute of Technology; Room N52-457; 77 Massachusetts Avenue; Cambridge, MA 02139; 617-253-8004; FAX 617-491-6266; TELEX 921473 MITCAM. See other INFOODS publications under the name Roselyn Romberg.
UN document identifiers: United Nations sales no. E.89.III.A.8. WHTR-14/UNUP-734.
The International Food Data Systems Project (INFOODS) is part of the United Nations University's Food and Nutrition Programme, and uses SGML in the organization of information collected on nutrient composition of foods worldwide. The handbook provides guidelines on the organization and content of food composition tables and databases; it also specifies procedures for the accurate international interchange of such SGML-structured data.
This book will be of interest to researchers designing SGML database applications. In addition to details of implementation specific to food data interchange, considerable thought has been given to problems of meta data (including space, tab and line breaks), use of data formulas, non-ISO 646 character sets, and sub-structuring of textual objects that are needed in many databases (e.g., postal addresses, email addresses). Some interesting work-arounds are also implemented, and will be of theoretical interest to researchers applying the SGML standard to databases.
This handbook supercedes a number of other technical and working papers which nevertheless may be of historical interest: (1) John C. Klensin, Intermediate Structural Tags. Working Paper INFOODS/IS N34. Cambridge, MA. December 7, 1987. 3 pages.; (2) John C. Klensin, A New Structural Tag Category -- Derived Measures. Working Paper INFOODS/IS N32." Cambridge, MA. 87.11.16. 2 pages; (3) John C. Klensin, Syntax and Semantics. INFOODS Data Interchange Scheme. Working Paper INFOODS/IS/N 6. Cambridge, MA 85.12.05. 17 pages; (4) Roselyn M. Romberg, Additional Discussion of the Interchange Scheme. Summary of Conclusions from Review of 'INFOODS/IS N6'. Working Paper INFOODS/IS N17. Cambridge, MA. 78.06.17, revised 87.07.10. 5 pages; (5) Roselyn M. Romberg and John C Klensin, Initial Root (Structural) Tag List. Working Paper INFOODS/IS N15. Cambridge, MA. 87.03.17, revised 87.06.19, final 87.07.16. 6 pages.
John Klensin [1992] is Chairman of the Standards Committee for the ACM (Association for Computing Machinery), thus serving on the board that oversees all Information Technology and Information Sciences standards activites in the US. He has also chaired one of the X3 Technical Commiittees in the language area for many years, and has maintained formal liaison to X3J6 for a long time while what is now SGML was under development. Contact [current 1992]: John C. Klensin; Director, INFOODS Secretariat; Massachusetts Institute of Technology; Room N52--457; 77 Massachusetts Avenue, Cambridge, MA 02139; Tel: 1 617 253-8004; FAX: 1 617 491-6266; Telex: 6502688345; MCI Cable: MITCAM.
Volume summary: Pt. I. Introduction and Overview. 1. Introduction to the Interchange System. 2. Technical Overview. 3. Introduction to the Reference Material -- Pt. II. The Reference Sections. 4. The Header Elements. 5. The Food Element and Subelements. 6. Data Values and Data Description -- Pt. III. Processing Data and Interchange Files. 7. Registering Elements. 8. Conversion of Data to Interchange Format. 9. Conversion of Data from Interchange Format -- Appendix A: Registered International Food Record Identifiers -- Appendix B: Element Registration Form.
UN Identifiers: WHTR-16/UNUP-774. United Nations sales no. E.91.III.A6
"Abstract: With many types of scientific data, the amount of descriptive and qualifying information associated with the data values is quite variable and potentially large compared with the number of actual data values. This problem has been found to be particularly acute when dealing with data about the nutrient composition of foods, and a system-based on textual markup rather than, for example, the relational model-has been developed to deal with it. This paper discusses the types of metadata encountered and the problems associated with dealing with them, and then describes this alternative approach. The approach described has been installed in several locations around the world, and is in preliminary use as a tool for interchanging data among different databases as well as local database management.
Abstract: Several models have been developed for designing and searching full text document databases. Standards, both proposed and official, have been developed to respond to questions relating to the structure, linkages, searching and transmission capabilities of full text databases. These standards include the open document architecture (ODA), the structured generalized markup language (SGML) and its derivatives, and the information retrieval service definition and protocol specification for library applications (ANSI/NISO Z39.50-1992). An overview of these standards is presented, along with their application to full text databases. The paper concludes with a challenge for information professionals to participate in the development and refinement of such encoding standards.
"Abstract: The authors are interested in the development of distributed multimedia information systems (MMIS) which use the HyTime international standard as the data model and interchange format. They have developed and implemented a prototype system in which interactive multimedia presentations can be stored and retrieved. Sample document instances are externally encoded in HyTime and stored in the database using the HyTime data model. The architecture and operation of the system are presented. Issues related to using a HyTime engine for general multimedia presentation and interchange are discussed."
Abstract: "After the decision to use SGML (or XML), the intense activities of Document Analysis, Information Modeling, and DTD Creation must follow. Many committed SGML/XML enthusiasts might announce a 'success' for their SGML/XML effort upon clearing these hurdles, and achieving the profound status of 'ownership of a valid DTD'. Many other casual SGML/XML adopters may 'simply download' a public domain DTD, and presume their success on the same basis, possession of a valid DTD. Any celebration of success is premature, however, until DTDs are subjected to, and pass, rigorous testing, which appraises their applicability to the production purposes for which they were intended."
[Conclusion:] "As with many of the activities necessary to deploy productive SGML and XML, testing is not glamorous, fun, or easy, but it is productive, rewarding, and economically sound. It has been frequently said of SGML/XML that "...standards based systems satisfy users who possess any technology adoption bias; strategists like the technology impact of standards on the organization, pragmatists like the sound economic metrics of standards, and conservatives like the safety and security that standards afford..." As these different markets demand different types of psychic, economic, and reassurance returns on their respective investments of trust, money, and discomfort, they also speak to their demand for meaningful testing by qualified SGML/XML implementors, before those systems are proffered for full-scale production!"
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "This presentation describes how the information repository of a publishing house was integrated into the environment of the company. The attempt was made to combine the entity relationship approach of an SQL database and the document-driven approaches of SGML. This led to more than one SQL database with an identical microdocument architecture to store the information elements. This presentation closes with a view to the future plans of integrated composing of products with the microdocuments of the database."
"Introducing SGML to a conservative publishing house is a long way to go. In the case of C. H. Beck, the leading company for legal publications in Germany, the efforts were driven by the demands of a continuous growing market for electronic publications, on line as well as CD-ROM.
"Since information is the main business of a publishing company, to create an effective information repository was the first step to go. The efforts were driven into two different directions.
"On one hand the information, the sources and the publication process was structured in classic entity relationship models. The analysis brought three different information models (legislative documents, court decisions and intellectually authored texts) implicating three different databases. Two of three databases represent an entity relationship model of the information. The third database (storing the authored texts like books) is document driven and mirrors the structure of the source publication. To enable the best flexibility and an easy handling of the data, in each case the documents were broken apart into micro documents of almost the same class.
"On the other hand the source documents and the resulting publications where examined in order to create a DTD. The resulting DTD is divided into several modules, that represent overall document structures (books, journals, sections etc.) and modules to indicate detailed information (tables, highlighting etc.). the overall DTD is intended as an abstract model in order to derive various different process specific DTDs. Thus the detailed element model corresponds with the micro documents of the information repository. The global document structures are created by the export function of the databases.
"In the future there will be a combining project management system, which will enable the product manager to create publications containing micro documents of all three databases and an overall structure.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
"Abstract: Technical publications typically are complex, bulky, and have a long life. Their compilation, updating, and editing for different target groups can only be achieved with an efficient organisation. The shortage of time and budget requires automation and modularisation of the production process. To meet these requirements in the framework of a compound document processing architecture SGML (ISO 8879), a system independent description language of the document structure is a first important milestone. Others like e.g. DSSSL or SPDL currently are under development."
See the main entry for this special issue of TEXT Technology dedicated to the TEI, edited by Lou Burnard.
Abstract: "Documents possess a natural structure which can be visualized when they are printed on paper. Traditionally, text processing systems do not support the structured creation of documents. Instead, the formatting specified by the author defines implicitly the structure. The system itself is not aware of this structure. However, if the structure could be defined in advance, the system could use it to direct the author to create documents with the correct structure. Further, the structure could then be used to produce different formats for the same content, to modify it and to search documents.
Electronic document processing consists of the creation, updating, formatting, storage, retrieval and dissemination of documents. Systems currently used for various document management tasks have been built using different principles and different ad hoc methods. Due to their various approaches, these systems require the author to use different representations, different operations, and different processing philosophies. This can cause many problems later when authors need to convert from one electronic representation to another version with the same content. What an author really needs is a uniform document processing system where all these activities would exist. In addition, this system should be convenient and easy to use.
"This thesis investigates how a syntax-directed translation method, previously applied to programming languages, may be used as the basis of a processing system for structured documents. In this case, the grammar serves as a tool to define the structure of a document, what kinds of representations the user wishes, which documents the user wants to retrieve from a set of documents, and what form the new structure should take. The document processing is considered as transformations from one representation to another. The system uses grammars to generate tools, automatically or semi-automatically, used during various processing phases. Parse trees for grammars form the interface between tools. Theapproach has been tested by building a prototype of an integrated syntax-directed document processing system.
"The thesis shows that structured processing is a feasible tool for the manipulation and management of electronic documents, and that a syntax-directed approach can be used for the subtasks of document management. The approach forms a consistent and application independent basis to produce and disseminate different representations from the same content using either the same or modified structures. The results were developed with a relatively small set of documents. However, this approach may be used also for the large and expanding number of documents available in information sources on worldwide computer networks." [Universal Decimal Classification: 519.68; 519.7; 681.3.068]
Published also as: Kuopio University Publications C. Natural and Environmental Sciences 53, 1996. Public defense of the doctoral thesis was on November 22nd, 1996. See http://www.cs.uku.fi/~kuikka/thesis.html (bibliographic data), or the thesis online in Postscript format. [Thesis, archive copy.]
"Abstract: Many documents have a definable structure. Some document formatting systems, like the LaTeX formatter, use a structural notation. In recent years the general mark-up language SGML has gained popularity. In this paper we study the transformation of one structure to another. For example, technical journals have their structure definitions, and an article originally written for one journal must be restructured before it can be submitted to another journal. We assume that structure definitions are grammatical, and study the transformations that can be automated or at least semi-automated.
"We took a collection of computer science journals and compared their structure definitions. We classified differences as simple, local and global. As transformation techniques we studied syntax directed translation schemata and tree transducers. Our conclusion was that simple and local transformations can be automated or semi-automated, depending on whether additional information is needed, while global transformations are difficult to automate. Transformations were tested on our prototype syntax-directed document processing system. The system has one module for editing a document under one structure definition, and another module for changing a document from one structure definition to another."
[Paper received December 4, 1995; revised July 22, 1996.]
Earlier report, available online: Eila Kuikka and Martti Penttonen, Transformation of Structured Documents. University of Waterloo, Computer Science Department, Techinal Report CS-95-46, 73 pages; URL: ftp://cs-archive.uwaterloo.ca/cs-archive/CS-95-46/. See also: Eila Kuikka, Processing of Structured Documents Using a Syntax-Directed Approach. Ph.D. Thesis. University of Kuopio, Department of Computer Science and Applied Mathematics, Finland. Kuopio University Publications C. Natural and Environmental Sciences 53, 1996. 76 pages + 4 Appendices. URL: http://www.cs.uku.fi/~kuikka/Thesis/thesis.ps.gz.
"Abstract: Structure definitions of documents have been used successfully for inputting and formatting in text processing systems. This report considers transformations between different representations of structured documents and studies possiblities to extend the use of structure definitions to document transformations and to discover algorithmic methods for carrying out transformations. Documents are presented as parse trees for context-free grammars and transformations are made from parse tree to parse tree. First, the report describes differences of manuscript styles demanded by various scientific journals and presents a declarative classification for structure differences between two parse trees. Second, a set of tree transformation methods are described and their suitability for transformations between documents having a structure difference in each defined class is analyzed. For each class several methods may or must be used and only certain kinds of differences can be managed automatically. Finally, instead of designing a system where a method accommodates for all kinds of differences or where different methods are used in various transformations, the report presents a model for a document transformation system that presents a possibility of using various methods according to differences in document representations. The system is divided two modules. In the first one transformations are made automatically and they do not change the hierarchical structure of a document. In the second one transformations are made semiautomatically or nonautomatically and the hierarchical structure changes. Differences between the existing and the required representation of a document are analyzed and methods selected according to the classified differences."
Available online: ftp://cs-archive.uwaterloo.ca/cs-archive/CS-95-46/. [archive copy]
Abstract: "This paper describes the filtering approach for searching documents whose structure is defined by a grammar. The method is based on the theoretical model for defining filters to specify information interest of a user. It is employed to find documents in SYNDOC, a syntax-directed text processing system. The method is suitable, for example, for SGML and ODA documents. The user selects a grammar and indexes only documents for the selected grammar. A filter generated in a syntax-directed way using the grammar describes conditions for indexed documents integrating structure and content constraints. The user compares a filter with indexed documents, and either edits, browses or prints original documents using the selected output form. Indexed documents, filters and retrieved documents can be stored for further purposes."
Keywords: structured document, filtering, context-free grammar, parse tree.
For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
Abstract: "Filtering is used to select a subset, corresponding to the information interests of a user, from a set of information items. The information interests are described in a filter which is created to control the selection. In our earlier work we have described a theoretical framework for specifying filters to express content-based and structure- oriented constraints on structured text. In the filters, the information interests of the user are expressed by constraints and annotations on two-dimensional templates. The templates are created from the grammar associated with the structured text.
"This report describes a prototype for the filtering method in a syntax-directed document processing system called SYNDOC. In SYNDOC, a filter is applied to documents associated with a common grammar. The application of a filter means finding the documents that match the filter. From user's point of view, filtering a subset of a given document document collection consists of the following six steps. First, a filter for a given grammar is defined; second, a directory containing documents associated with the grammar is chosen; third, indexing is applied to the documents (unless indexed documents were chosen); fourth, the filter is applied to the indexed document of the chosen directory; fifth, the form of the output is defined; and sixth, the filtered documents are displayed in the specified form. In the current phase of the implementation, the matching test is applied to one document at the time, and in case of matching, the document is displayed using the default output form."
The reports is available in Postscript format by ftp from the University of Joensuu: ftp://ftp.cs.joensuu.fi:/pub/Reports/A-1995-4.ps [mirror copy]. See: other publications of Eila Kuikka on "Filtering of Structured Documents" [mirror copy], or the WWW Home Page. The report is also available via postal mail: Department of Computer Science, University of Joensuu, P.O. Box 111, FIN-80101 Joensuu, Finland. See also: E. Kuikka and A. Salminen, "Two-dimensional filters for structured texts," to be published in Information Processing and Management.
February 09, 1998. Announcement from Eila Kuikka for the public availability of a revised report Survey of Software for Structured Text. This report, available in HTML (hypertext) and Postscript format, surveys some 207 software tools that claim to support the processing of structured documents. This publication updates the 1994 survey which reviewed 89 software packages (see immediately below). Most of these software tools are SGML/XML compliant or aware. Description, contact information, references, and prices are listed for each software package. The database entries are accessible via alphabetical (name) listing, by software 'type' (in eighteen categories), and by price. This revised and expanded 1998 edition of the Survey is authored by Eila Kuikka (Department of Computer Science and Applied Mathematics, University of Kuopio, Finland) and Erja Nikunen (Nokia Telecommunications, Finland). In HTML format: http://www.cs.uku.fi/~kuikka/systems.html, and published also as a technical report of the Department of Computer Science and Applied Mathematics, University of Kuopio, Finland. [local archive copy]
See the preceding bibliography entry for the 1998 update.
The authors have prepared an overview ["Systems for structured text"], a summary and a 3-part software-systems listing from the longer report. The abstract for the list: "This list [3 parts in 3 disk files: A-E, F-M, N-Z] [was] a part of a report published in Finnish as a technical report of the Department of Computer Science and Applied Mathematics, University of Kuopio, Finland. The aim of the report was to give a brief overview of electronic text and its processing by computers. The main part of the report is a section that contains a short description and typical features of 89 systems. This English summary contains only that part of the report and our aim is not to update this list later."
The overview page describes the extracted portions of the report (now updated). The three main documents are still available here, for historical value, as mirror copies [copied April 12, 1995]: the summary; part1 A-E , part2 F-M, and part3 N-Z.
Abstract: "Timelines represent a familiar means for representing the relationship among historical events. When incorporated into the context of electronic documents, the timeline provides the basis for implementing an interface into an event space, relying particularly on hypertextual-style links. Generalizing timelines also permits the flexible representation of many different kinds of relationships beyond the temporal. This paper includes examples of such representations, showing examples from prototype implementations."
For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
Abstract: "Ten years after SGML was adopted as an international standard, more organizations than ever before are investigating its possibilities. The reason is simple. The problems addressed by Total Quality Management in the manufacturing and general service industries are magnified enormously in knowledge work and are much more difficult to address. Accessibility and reusability of information are important, and so are the relevance and applicability of information in a particular problem-solving context. Redundant knowledge creation and information rework waste organizational effort and dollars and have a profoundly negative effect on programs, processes, and systems. To combat redundancy and rework, organizations are seeking solutions in standard tools and standard data representations."
Note: The above presentation was part of the "SGML Business Management" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"Standard Generalized Markup Language (SGML) encodes medieval and Renaissance manuscripts and printed books with difficulty. This computer language is an ISO standard, but one acknowledged more in the breach than in the observance. Here I argue that the humanities should follow the originators of the World Wide Web, who made HTML (Hypertext Markup Language), an encoding standard using SGML syntax but serving purposes alien to the intentions of SGML's creators. The Text Encoding Initiative (TEI) SGML document-type definition is unusable for my kind of scholarly editing, and for the editing of early texts generally. However, the TEI Guidelines is an excellent discussion of tagging, principles and practice, and its system of over 400 tags is the starting point for anyone interested in text encoding." [from the document Introduction]
The document is available on the Internet as part of the official conference record: see http://www.ucalgary.ca/~scriptor/papers/lanc.html [mirror copy, December 1995]. For further details on the Electric Scriptorium conference, see Electric Scriptorium Home Page.
Abstract: "Technical and Management Services Corporation (TAMSCO) and Warner Robins Air Logistics Center/LB/LU Directorate recently began a cooperative effort to develop a more efficient way to manage the data for the C-130 flight manuals. WR ALC/LB/LU recognized the tremendous cost and inefficiencies in managing the existing C-130 data. With the assistance of TAMSCO, this cooperative effort is currently reengineering the existing process for creating, distributing, accessing, and reusing the technical information. By using Standard Generalized Markup Language (SGML), this effort will realize the ability to store and reuse technical procedures more efficiently. The SGML data will be accessible to the end users through an electronic information base both digitally and hard-copy. Using SGML and the AF Standards will bring many benefits and lower maintenance costs. The future success of the USAFs C- 130 Technical Manual program depends on how effectively and efficiently the existing data is identified, maintained, managed, and used."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "An Interactive Electronic Technical Manual (IETM), as defined in the DoD IETM specifications, is a package of information required for the diagnosis and maintenance of a weapons system, arranged and formatted for interactive screen presentation to the end-user. Technical and Management Services Corporation (TAMSCO) has been assisting the military develop IETMs using commercial off the shelf (COTS) products with open ended software interfaces. IETMs provide many benefits over traditional paper manuals, as will be discussed. TAMSCO recognizes that while the concept of IETM is still a new technology, it is only an application of finding a more efficient and effective way to provide support and maintenance to existing military weapon systems."
This paper was delivered as part of the "IETM" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "In this paper, we concentrate on justifying the decisions we made in developing the TEI recommendations for feature structure markup. The first four sections of this paper present the justification for the recommended treatment of feature structures, of features and their values, and of combinations of features or values and of alternations and negations of features and their values. Section 5 departs briefly from the linguistic focus to argue that the markup scheme developed for feature structures is in fact a general-purpose mechanism that can be used for a wide range of applications. Section 6 describes an auxiliary document called a 'feature system declaration' that is used to document and validate a system of feature-structure markup. The seventh and final section illustrates the use of the recommended markup scheme with two examples, lexical tagging and interlinear text analysis."
See the bibliographic reference for the Maler / El Anduloussi book.
The annotated Table of Contents for Developing SGML DTDs complements the corresponding book review article by Chet Ensign, also published in this issue of Markup Languages: Theory & Practice.
The annotated Table of Contents for Megginson's Structuring XML Documents complements the corresponding book review article by Chet Ensign, also published in this issue of Markup Languages: Theory & Practice.
Summary: This presentation by the program Co-chair provides an overview of the structure of SGML/XML '97 conference. Lapeyre explains the various technical tracks, vertical tracks, vendor demonstration theatre, exhibit hall, poster sessions, special user group meetings, tutorials, BOF meetings, the SGML/XML '97 bookstore, and other important conference events.
"Welcome to SGML/XML'97, the conference that is both the largest SGML Conference ever and the largest XML conference ever. We're all here to have a good time: to learn new things, to consolidate our positions, to expand our minds, and to make technical progress happen. The first conference in this series, back in 1988, only dealt with SGML and had 52 attendees with no exhibit hall. As this conference has been able to say every year for the last 11 years, this year we have more attendees, more sessions, more tracks, more vendors, and more night events than any SGML conference has ever had before! This year we've expanded; XML has joined SGML as a major technical focus."
This presentation was part of the "Introductions" track at the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
"The American Memory DTD is in use to capture a variety of materials, and to re-tag some documents that were previously tagged using a non-SGML generic tagging scheme. The DTD has proven useful for tagging a variety of texts. American Memory has digitized a variety of Library of Congress collections. They are currently interested in talking to potential partners who may be interested in publishing some of these collections. . . While there is no easy way to measure the relative accuracy or retrieval system precision using SGML as compared to non-SGML encoding, the SGML option, and selection of the TEI model for American Memory seem to be working well." [extract]
The document is available online: http://www.cs.vassar.edu/~ide/DL96/Lapeyre; [mirror copy]. See the main workshop entry or the program listing for other workshop details.
The open letter is sent from SGML Open's Executive Director to Ron Wilson of NIST on behalf of SGML Open, GCA, and the SGML Users' Group. It expresses concerns about NIST's proposed testing procudures for the SGML community. This letter is a supporting document to the article of Pamela Gennusa on NIST's proposed Conformance Testing program.
Report on the activities of SGML Open (the SGML consortium), especially following meetings of the marketing and technical committees after the Documation '94 conference.
"SGML is one of the most important document management investments that an organization can make because it ensures the interoperability of its information. Technology changes every eighteen months, which is one of the reasons why we invest in open systems that are based on de jure and de facto standards. We want to be sure that the hardware and software that we buy today will work with other systems we have now or will have in the future. Investments in open systems platforms and architecture may increase the value of a corporation's physical assets, but those systems are guaranteed to be replaced eventually. The best investment that we can make in open systems is the development and maintenance of open information, which is what SGML enables.
"Besides strategic benefits, SGML also offers the tactical advantages of allowing an organization to share the same data across multiple document repositories, thereby supporting enterprise-wide document management. Organizations can choose the products and technologies that are best suited to their needs, while knowing that their documents are interchangeable and accessible to anyone, even across repositories." [extract]
A related document under the title "Standards for Interoperability" also appeared in the January/February issue of The Gilbane Report (CAPV Publications - Gilbane Report). As one of the SGML Open White Papers [WHITE PAPER #1001-SO, see also the printed abstract, mirror], the document is available on the Internet as "Standards for Interoperability" [mirror copy, partial links, November 1995].
"Abstract: The Cheshire II online catalog system was designed to provide a bridge between the realms of purely bibliographical information and the rapidly expanding full text and multimedia collections available online. It is based on a number of national and international standards for data description, communication, and interface technology. The system uses a client server architecture with X window client communication with an SGML based probabilistic search engine using the Z39.50 information retrieval protocol."
"The Cheshire II system is being made available for public use in the UC Berkeley Astronomy-Mathematics-Statistics Library (a medium-scale academic branch library, circa 75,000 volumes) using modern workstations, and to the national mathematics research community via network access. Use and acceptance of the system and its features will be evaluated using transaction monitoring and questionnaires."
A version of the paper is also available online: http://sherlock.berkeley.edu/asis_paper/paper.html. See also the main entry for the Cheshire II Project
"Abstract: Drawing upon scholarship on legal drafting, current document assembly technology, and aspects of the Standard Generalized Markup Language (SGML), this article discusses the forms of knowledge at play in the creation of legal documents. It also examines the notion of self-describing documents and their potential role in new modes of expression and knowledge pertinent to legal drafting."
The author concludes that SGML is the "best choice" for creating a multiform text. [full abstract/summary needed]
Summary: "My aim in this paper is to talk about our choices in encoding texts, and, in particular, to focus on decisions about things not to do. One well-known reaction to the sight of the imposing bulk of the TEI Guidelines is the cry of despair at the thought that every word must be mastered and applied wherever and whenever the appropriate text features crop up. Of course, this is not the intention at all; but the decision about what to use is still a real problem, particularly for projects---the most common sort, I think---that do not have a specific use for their texts in mind, but instead aim to provide a generally useful digital collection. [...] If I had to sum up what I have to say in one sentence, it would be: Don't tag what you don't understand. But a somewhat more Wagnerian statement of my point would be: Don't tag things that aren't fully worked out or elaborated, and don't tag the random, the occasional, the unique, or---to use an Aristotelean term that I'm going to be adopting---the accidental."
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/whatnot.html; [local archive copy]. See the main database entry for additional information about the conference, or the Brown University web site.
Abstract: "A dramatic work may be seen either as an event or as a text; the TEI Guidelines make it possible to encode a dramatic work in either way, but do not attempt to solve the difficult problem of doing both at once. The basic element of a dramatic work, when seen as a text, is the speech; the Guidelines also provide elements for encoding other familiar parts of dramatic texts (such as stage directions and cast lists), as well as for encoding analytic information on various aspects of texts and performances that is not normally included in printed dramatic texts. There are often other formal structures in dramatic works that intersect with the structure of speeches -- metrical structures, for example; we discuss approaches for encoding these structures."
Abstract: "Among the major trends in WWW applications today are (1) client side applications and (2) use of SGML and now XML for richer information modeling. Most Web applications today exclusively rely on servers to perform all computations and data manipulation requested through the Web/HTML browser. The limit of this design model have been clearly reached and new models are considered where more intelligence is brought back in the client, that is in the Web browser area.
"At the same time, the limits of the HTML modeling capabilities become more and more obvious as Web applications develop and XML, as a specialized profile of SGML, is now recognized as a major break through in the domain of advanced WWW applications."
"In this article, we present an application in the domain of technical documentation for the automotive industry that requires such Web and client-side architectures. This application provides consultation of Illustrated Parts Catalog (IPC) modules with real-time configuration management. Configuration management here consists in presenting to a user the exact documentation for the vehicle he/she has to repair."
"We present the application itself and how/why it requires such a web and client-side architecture. Because XML browsers are not yet a reality, we also propose a short term integration solution that can implement this architecture with today's HTML browsers coupled with an external XML engine."
This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Two approaches are available for specifying transformation processes on SGML documents: a declarative approach, based on context-sensitive rules triggered on SGML parsing events, and a procedural approach, based on explicit manipulation of the document tree."
"This paper shows that each approach is optimal for a certain class of problems, but that both are actually needed and that maximum expressive power is achieved when both can be combined in a same program."
The document is available online in HTML format: http://www.balise.com/current/articles/lecluse.htm; [mirror copy].
An alternative source for information presented in this paper is the Proceedings of SGML Finland '96; see the paper by François Chahuneau, "Event driven or Tree Manipulation Approaches to SGML Transformation - You Should Not Have to Choose."
Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"Abstract: An SGML markup system is presented. One major obstacle for the SGML to gain more application is the prohibitively high cost of the markup process. The system the authors present adopts an incremental design approach. This approach helps to "divide and conquer" each specific problem encountered during the markup process and ensures that the system converges to a almost-fully automated markup system. The major software components of the system are described. Some selective algorithms are also introduced."
Conference: [cit].
Abstract: "Brunei Shell Petroleum (BSP) has developed a system to support the production operators in their day-to-day activities on the Platform. This system (named 'Ajaib' which is Malay/Arabic for Miracle) breaks away from the traditional Operations Manual and instead delivers all information required by the Operator in support of his day-to-day activities from a single, commodity desktop Web browser. The information is managed in its native format (e.g., SGML, AutoCad) and is presented in a variety of formats including animation and graphics; this session aims to provide insight into the development and acceptance of a corporate Intranet solution."
"BSP decided that the core information for the system should utilise SGML to manage the various information content types and relationships. BSP chose to re-use DTDs specifically developed for Shell Expro to capture the information. This information consisted of asset information (e.g., equipment descriptions for specific platforms, pipelines and systems), organisation information (e.g., description of BSP personnel and their responsibilities), and activity information (e.g., descriptions of maintenance tasks that operators perform each day). Furthermore, the system should also contain additional explanatory information, as is usually contained in training manuals. The new system had to provide all the information that operators require from a single point of delivery, in a format that would be appealing to the operators. The decision was made to use Web technology and standard products to deliver the specially created content in a textual and graphical form. The textual information would be converted from SGML to HTML prior to delivery of the final system."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
For more information, see the database entry for MtSgmlQL, or documentatin on 'the SgmlQL interpreter': http://www.lpl.univ-aix.fr:81/.www/projects/SgmlQL/MQL1.html.
Apparently a translation of the English document SGML and ODA: Standards for Document Processing and Interchange, published by the Danish national standards body in 1989. ['Traduction de l'ouvrage publie en anglais par Dansk Standardiseringsrad sous le titre "SGML and ODA. Standards for document processing and interchange" en 1989.']
Abstact: "SGML is an international standard defined by ISO, oriented to provide a markup language to describe the logical structure of documents. One of SGML's objectives is to allow every formatting system to process a document described by SGML elements without modifying the manuscript, i.e., it is not necessary for document authors to know which formatting system will process their documents. An approach to achieve this objective could consist of transforming the SGML document into a source file for every formatter wanted, with the support of a special map table which associates every SGML element with formatting control sequences of the formatter itself. To this end we define a language, called METAFORM, which at run time is capable of selecting the appropriate set of formatting commands to be inserted into the formatter source file we are generating, on the basis of the current status of every element in the SGML document. The paper describes the main characteristics of the METAFORM language and its application to SGML documents."
Note: The volume editor for SGML Users' Group Bulletin 4/1 is David W. Penfold (Edgerton Publishing Services, Huddersfield, UK).
"Abstract: The standard SGML, proposed by ISO, is a declarative markup language. It provides a coherent and unambiguous syntax for describing document elements. A document described by SGML can be submitted to the treatment of all applications (data base, editor, formatter . . .) because it does not contain any particular processing instructions. This paper refers to some proposal for the application of SGML concepts in the formatting environment. First, it analyzes the possibility of integrating SGML with the TEX formatting system. Subsequently, it describes an environment for the document preparation, where the user, even inexperienced, is able to define the logical structure and the text of documents interactively and graphically, respecting the semantic meaning of SGML elements. The document defined is processed by the SGML parser producing an intermediate and system independent file. Subsequent interpretation of this file and the support of a special map table will allow a SGML document to be processed by all systems; i.e. it is not necessary for document authors to know which formatting system will process their documents."
Abstract: "SGML is a standard proposed by ISO [International Organization for Standardization, Geneva] for documents description based on a generalized markup technique. The formatting process of a SGML document could consist in singling out markup elements and inserting formatting directives into the document in accordance with the class of the markup elements themselves, using a suitable map table.
"This paper will present an implementation of an environment of SGML documents production, emphasizing a special language, METAFORM, for the map table construction."
"Abstract: SGML (Standard Generalized Markup Language) is an International Standard defined by ISO for documents description based on generalized markup technique. The author refers to some proposals for the application of SGML concepts in the formatting environment. He describes an environment for SGML documents preparation, where the user, even inexperienced, is able to define the logical structure and the text of documents interactively and graphically, and where the document so defined can be processed by all formatting systems."
[Note: pagination may be "110-126" in a variant publication of the proceedings.]
Abstract: "Is Perl a suitable language for programming XML? The use of Perl with XML is illustrated in this article with a program that checks to see if an XML document is well-formed. The relative simplicity of the program demonstrates that lightweight Perl programs may be used with XML, although Unicode and the use of entities make it difficult for Perl programmers to handle some XML files."
Summary: "Many organizations are considering using Word 6 for SGML authoring now that Microsoft's SGML Author is on the immediate horizon. We'd like to offer some reflections on our decision to use Word 6 and our experience with conversion to SGML with respect to our overall philosophy of incremental, evolutionary project engineering."
Aavailable on the Internet: http://www.textscience.com/w6paper.html; [mirror copy of paper, partial links]. See also the accompaning poster which describes various strategies for mapping Word 6 styles to SGML. Or: RTF STYLES TO SGML UTILITIES IN PERL - http://www.textscience.com/stylecod.html.
Abstract: "At Pacific Bell we have developed a document distribution system which leverages a number of Internet technologies to solve extreme scaling requirements while staying cost-effective and meeting demanding business needs. The system, currently a fully operational prototype, is a combination of off-the-shelf and custom software. We are in the process of customizing our solution for specific internal organizations while introducing open document standards which enable the production of information compatible with our delivery system."
[SuperBook's companion and succesor: ] "In an effort to move to open standards and commercial products and to take advantage of the quickly developing Internet technologies, Pacific Bell issued a request for proposal (RFP) in October 1995 for a large-scale SGML browser. This paper provides an overview of the requirements and describes how various commercial components have been integrated to meet our business needs..."
"The SGML markup in our system creates, as Douglas Engelbart expressed it 'Explicitly Structured Documents -- where the objects comprising a document are arranged in an explicit hierarchical structure, and compound-object substructures may be explicitly addressed for access or manipulation of the structural relationships.' Open Text's LLS provides us with the ability to index these hierarchical objects and to retrieve them using "region expressions", the general principle of which is described in [x].
"Our documents originate from a variety of sources including Microsoft Word, native HTML, and SGML. In all three cases we require the authors or providers to, at a minimum, encode hierarchical divisions into the documents. Those hierarchical divisions are used to generate the TOC database and thus become basic units of content search and retrieval. The authors may also identify other units, or containers, of content (by use of tags or styles depending upon their authoring tools), and when they do, these become additional units of search and retrieval in the delivery system. These additional containers may be smaller (finer-grained), or larger than the hierarchical units, and may be based in traditional document technology, 'example' or 'paragraph', for example, or may contain units meaningful within an application or knowledge domain.[...] Our current SGML markup encompasses HTML; that is, both HTML and non-HTML tags can exist in a document at the same time. The non-HTML tags provide container and the addressing information while the HTML markup describes content formatting. Currently, we are able to deliver these 'SGML+HTML' documents directly to the Netscape client as it, and other current generation browsers, simply ignores the unrecognized SGML markup."
Available online in HTML format: http://yuri.stanford.edu/ic1q97/final.htm; [local archive copy].
Abstract: "The Multipurpose Internet Mail Extensions (MIME) provides an extensible capability to receive, via electronic mail, more than just plain text. Its capabilities include audio graphics and video. As an Internet Draft Standard it is being widely deployed in commercial and public domain mail systems. The extensibility makes it an attractive vehicle for the exchange of documents that use the Standard Generalized Markup Language (SGML). This paper reports on work in integrating the MIME and SGML standards."
"MIME is the result of an Internet Engineering Task Force effort to expand the capabilities of Internet mail without disturbing the existing base of text mail systems. It provides mechanisms for content labelling and multiple content parts within the Internet message body. There are seven basic MIME content types, text, image, video, audio, application, message, and multipart. Multipart indicates a message that contains multiple body parts each of which is an independent message part; the others represent atomic message units. Each content type has associated with it a set of subtypes that the MUA uses to precisely identify the contents and then to invoke the appropriate software to process that content."
"MIME can also be used to create an ad hoc encoding for the SGML Document Interchange Format (SDIF) data stream, allowinf a MIME-capable mail user agent (MUA) to directly display the encoded documents. For SGML and SDIF processing, new subtypes are proposed that identity the documents and allow for the appropriate processing.
"When exchanging an SGML document, the document's internal entity and process structure must be transferred along with the files. Maximum utility occurs when these structures are represented in a system independent manner. The MIME approach exposes those structures and transports them by providing them with a canonical representation. An experimental implementation was built using the publicly-available software packages, sgmls (an SGML parser) and mh (a mail user agent). They were modified to generate and read MIME encoded SGML documents."
This article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University.
See now: XML Media/MIME Types.
"Abstract: This draft describes the encapsulation of a Standard Generalized Markup Language (SGML) document withing a MIME message. It proposes new content sub-types of Text/SGML, Application/SGML, and Application/SGML-notation, and a new header, Content-SGML-Entity. This specification uses the proposed Multipart/Related Content-Type [RFC-REL] and access-type=content-id [RFC-ACTI] specifications. Multipart/Related provides the mechanism for treating the entire document as a single object and access-type=content-type allows a single MIME entity to appear several times without replicating the body of that MIME entity."
The filename is draft-ietf-mimesgml-encap-00.txt. See the cover letter which explains this document in relationship to two others, and which provides Internet access points. A lightly HTMLized version is available here.
"Abstract: The Multipart/Related content-type provides a common mechanism for representing objects that are aggregates of related MIME body parts. This document defines the Multipart/Related content-type and provides examples of its use."
The filename is draft-ietf-mimesgml-multipart-rel-01.txt. See the cover letter which explains this document in relationship to two others, and which provides Internet access points. A lightly HTMLized version is available here.
"Abstract: When using MIME [MIME] to encapsulate a structured object that consist of many elements, for example an SGML [SGML] document, a single element may occur several times. An encapsulation normally maps each of the structured objects elements to a MIME entity. It is useful to include elements that occur multiple time exactly once. To accomplish that and to preserve the object structure it is desirable to unambiguously refer to another body part of the same message. The exsisting MIME Content-Type Message/External-Body access-types allow a MIME entity (body-part) to refer to an object that is not in the message by specifying how to access that object. The Content-ID access method described in this document provides the capability to refer to an object within the message."
The filename is draft-ietf-mimesgml-access-cid-00.txt. See the cover letter which explains this document in relationship to two others, and which provides Internet access points. A lightly HTMLized version is available here.
Intro: "XML offers all the power of Dynamic HTML, Style Sheets, and other HTML extensions within the confines of an extensible framework. Unlike HTML, XML documents are specified in two parts. One is the XML document itself, which may look like an HTML document except that it will probably have a lot of new tags. The other part is a Document Type Definition that explains what the new tags mean and how they should be interpreted. The separation of the DTD from the document's contents lets Web developers extend the language simply by creating new DTD files."
The document is available online: "XML Is The Future Of HTML", [mirror copy]
Abstract: While reuse is currently the focus of much attention in the programming language community, it is also a central, but less noticed, issue in the creation and use of documents, and therefore in the design of document systems. To a great extent, the work of producing new documents, and new versions of old documents, involves reusing pieces of previously existing documents, where reuse involves finding the relevant material, modifying it as needed, and stitching the pieces together. The objective of this paper is to demonstrate how a focus on reuse can shed light on current efforts to build structured document systems and to design and use standards, such as SGML, ODA, and OLE, that address structured and compound documents.
"Abstract: The Extensible Markup Language (XML) is a new language specification submitted to the World Wide Web Consortium (W3C). The specification (available online at http://www.w3.org/TR/PR-xml-971208) defines this new language in terms of both SGML and HTML, and is specifically designed for the Internet. In the era of online electronic journals, currently wrapped in HTML, this has significant repercussions for electronic publishing." [Note: see the subsequent W3C Recommendation 10-February-1998http://www.w3.org/TR/1998/REC-xml-19980210.]
Possibly online: see http://www.mcb.co.uk/oclc.htm.
Abstact: "The Commission of the European Communities (CEC) is keen to promote harmonization throughout European testing services in the areas of Information Technology and Telecommunications, including the availability of test technology and equivalence of test results between test laboratories. To this end, in 1985 the CEC launched the first Conformance Testing Services (CTS-1) programme covering various topics including compilers, OSI protocols, and computer graphics. In 1988 the CEC launched their second CTS programme (CTS-2) to cover fresh areas, one of which was a three-year project to establish harmonized conformance testing services for SGML systems throughout Europe. This paper will explain the conformance testing process in general terms and outline how this is applied to validating SGML parsers. The paper will also consider the benefits of using the service, and its availability."
Note: The volume editor for SGML Users' Group Bulletin 4/1 is David W. Penfold (Edgerton Publishing Services, Huddersfield, UK).
"Abstract: Interactive Authoring and Display System (IADS), a Microsoft Windows application, provides an environment to develop (author) and read (display) documents via a PC. IADS provides both a textual and graphic environment. IADS uses Standard Generalized Markup Language, ISO 8879 (SGML) as its internal file format for textual data. Both vector and raster graphics are supported through CALS standard data formats MIL-D-28003 CGM and MIL-R-28002 Raster Type I as well as Windows .BMP, .PCX and other industry standard formats. IADS was chosen by the Naval Air Warfare Center Weapons Division, Point Mugu as the environment for all Technical Manual publications for the Sparrow Missile Test Set (AN/DPM22-12(V)). Technical Manual development on IADS has been straight forward, requiring a minimal amount of self training. This paper presents development and operational features of IADS, encouraging others to develop/maintain manuals (of any complexity level) in the same manner."
The IADS package (version 2.0, March/April 1995) is available on the Exeter FTP server and elsewhere.
"Abstract: This paper describes work being carried out by CIMI (the Consortium for Interchange of Museum Information) on the analysis of exhibition catalogues. This is being undertaken as part of Project CHIO (Cultural Heritage Information Online). The project plans to use the SGML (Standard Generalized Markup Language) standard to express the structure and content of source materials, including exhibition catalogues. The analysis that was undertaken led to a particular view on how exhibition catalogues (and by extension, any text-based museum information sources) could be marked up to support retrieval of extracts relevant to a wide range of queries. The process of analysis is described, and the resulting design decisions outlined. The paper concludes with an assessment of the possibilities for information retrieval offered by this approach."
Available on the Internet: Richard Light: Getting a handle on exhibition catalogues: the Project CHIO DTD; [mirror copy]. A text version is also available. See also the main entry for the Consortium for Interchange of Museum Information.
Presenting XML was probably the first publiched book written entirely on XML. It will "help readers learn the fundamentals of the Extensible Markup Language (XML); help them understand the relationship between XML, SGML, and HTML; and enable them to write their own XML applications to deliver structured information to the World Wide Web." Richard Light is a well-known authority in the SGML/XML world, and Tim Bray is co-editor of the XML specification. One reviewer writes that the book, through no fault of its own, "suffers from being a snapshot of a moving target, but [is] a worthy first volume in the soon-to-be-large XML library."
Description: ". . . this reference takes you on an introductory tour of this robust technology, showing you how the technology can work to your advantage. You'll learn to create XML documents, separate style from content, and create power links with XML. In addition, you'll find out how XML is being used today and what impact it will have in the future. With Presenting XML, you'll get a quick, efficient introduction to XML and everything it has to offer, and you'll learn why this dynamic markup language is the wave of the future." [publisher's blurb] See provisionally the description from Macmillan's superlibrary.com server, or the announcement from Simon North. Alternately, check the companion web site for the volume.
A review of Presenting XML was published in XML Files: see the bibliography entry, or the source at http://www.gca.org/memonly/xmlfiles/issue4/book.htm. See also the review on the XMLxperts site.
Description of the Oxford University Press SGML tool called "The SGML Tagger," developed by Richard Light. The tagger is designed to be loaded on top of word-processing software so that tagging may be done without a special editor. The tool is compatible only with text-based DOS word-processors, and thus not with Microsoft Word under Windows. Contact: +44-865-267979. Or see: Richard Light: The SGML Tagger; [mirror copy]
Abstract: "The following paper was given as a talk at the 'XML Mixer' in La Jolla, California in late July '97, before a combined audience of clinicians, computing professionals, and vendors of document processing software. What brought the group together was an ongoing effort to introduce markup technology into the processing of healthcare information in an ISO standard manner, using SGML (Standard Generalized Markup Language) and SGML's strict subset, XML (Extensible Markup Language). Other speakers spoke more specifically to processing topics, work flow, or business issues in the use of information systems in medicine, but the emphasis here is on some long perceived, but often overlooked problems in the semantics of communication. Both the general and the specific are important ingredients in this area, which indirectly indicates why the document format offers the appropriate middle ground between free text and excessively rigid (but easy to process) data structures."
Note: further information on the role of SGML/XML in medical informatics is found in the database section for the SGML Initiative in Health Care (HL7 Health Level-7 and SGML).
"The main goal of this CAIT White Paper is to address a very general problem in automated record keeping as it applies to Health Care -- thereby resolving the long standing unmet need for an Electronic Clinical Chart (ECC) for on-line clinical use [Institute of Medicine 1991, Ball 1992]. In clinical medicine (and in other similar venues) documentation must be responsive to real world circumstances. As a consequence, the information components are typically highly variable in both form and content, complicating their management and use. Today's technologies offer an opportunity to develop and introduce an effective new systems architecture based on the concept of "document processing" that can markedly improve processing effectiveness by anticipating such variability and making it's management a part of the underlying logic. Here the notion of the document as the object to be stored and processed is in contradistinction to the common computing view in which data, records, and fields are the fundamental items. Electronic documents, properly enhanced with additional labels, can form the archive from which data can be extracted from various viewpoints for classic processing, providing greater flexibility to end-user applications and enhanced results."
"The new approach considers each component of the medical chart as a loosely structured document in which the components can be uniquely delimited in some uniform manner by tags or labels [Essin 1993]. To do this, the (ISO) Standard Generalized Markup Language (SGML), which has been designed for this purpose with respect to data display and formatting, is extended to organize medical content. Here appropriate new content related tagging conventions are introduced that delimit each specific item and section for subsequent retrieval and processing."
Available online via FTP: FTP Remote file dumccss.mc.duke.edu/standards/SGML/proposals/CAIT-white-paper.txt, [or mirror copy].
"Abstract: Improvements in computer technology, the Internet and the development of wireless and satellite communications have led to several innovations in medical informatics. Telemedicine involves transmitting images and other information to and from medical centers. It could lower costs substantially by allowing physicians and nurses to participate in a patient's treatment without having to travel to the site. Many medical journal publishers are supplementing their traditional printed product with a site on the Internet. This is facilitated by their adoption of the Standard Generalized Markup Language (SGML), which uses standard tags imbedded in the text that allows the integration of text supplied by different publishers. The Digital Imaging and Communications in Medicine (DICOM) and Health Level 7 standards can be used in telemedical applications. The development of the electronic patient record has generated concerns about confidentiality."
"Perhaps influenced by the success of the abbreviated hypertext markup language (HTML) version used in Web applications, commercial publishers are moving to adopt the standard generalized markup language (SGML) for electronic publications -- some 10 years after its introduction. The SGML standard uses embedded tags instead of local, nonstandard printing instructions to identify various publication elements, thus improving prospects for integrated access to information generated by different publishers."
Abstract: "We present two techniques for transforming structured documents. The first technique, called TT-grammars, is based on earlier work by Keller et al., and has been extended to fit structured documents. TT-grammars assure that the constructed transformation will produce only syntactically correct output even if the source and the target representations may be specified with two unrelated context-free grammars. We present a transformation generator called ALCHEMIST which is based on TT-grammars. ALCHEMIST has been extended with semantic actions in order to make it possible to build full scale transformations. ALCHEMIST has been extensively used in a large software project for building a bridge between two development environents.
The second technique is a tree transformation method especially targeted at SGML documents. The technique employs a transformation language called TranSID, which is a declarative, high-level tree transformation language. TranSID does not require the user to specify a grammar for the target representation but instead gives full programming power for arbitrary tree modifications. Both ALCHEMIST and TranSID are fully operational on UNIX platforms."
The dissertation was presented on June 18, 1997. It is available online in Postscript format, via FTP or HTTP; [local archive copy]. See also a list of the author's publications, and some publications of the University of Helsinki SID [Structured and Intelligent Documents] project.
Summary: "This session will address the central role that identification and analysis of document components plays in the selection, design, and deployment of document management systems. The key to success in installing such a system is a thorough analysis of your information model. The SGML document analysis process is the cornerstone of this effort. The total effort should be no less rigorous than that used for designing and deploying any database management system. Additionally, information analysis should be independent of the deliverables such as paper documents that traditionally inform the design of a document management system."
"SGML is today's most powerful comprehensive object model for document information, and as such is an ideal mechanism to migrate the underlying document structure need to change to move from a file based to a component based DBMS system. The SGML document analysis process is the starting place for the information analysis required for successful component management."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
The article is based upon an interview with Roy Tennant, Sunsite Project Manager. According to the Sun Observer, Sun Microsystems "provides the hardware to enable Berkeley's Digital Library Project to make photos, literature, and other artwork available to any user for downloading."
See provisionally: "Finding Aids for Archival Collections: SGML Translated on-the-fly Into HTML". [need also document URL]
Abstract: "Maintaining large amounts of SGML data in separate files on a file system has always been a difficult proposition. Trying to coordinate a distributed workgroup environment is even more difficult. Simple mechanisms such as ID and IDREF can become a nightmare on even small projects. A database environment offers many exciting possibilities for features such as version control, sharing, validation, and distribution. The challenge is to develop a system that is capable of accepting any SGML document and flexible enough to support many different SGML database applications."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"Abstract: Links and versioning are two important aspects of document management. Supports in these two areas could certainly enhance the functionality of a document management system. However, in addition to the problems posed by individual areas, the integration of the two generates more considerations. The situation is further complicated by the extensive scope and many possibilities in this context. This paper thus attempts to describe a specific set of link versioning behaviours to provide a platform to explore the various issues of link versioning. Based on this model and the SGML environment, two methods to handle link versioning are presented and analysed."
In this book, Lobin endeavors "to describe the field from a somewhat distinct perspective, with a strong emphasis on architectures and some other SGML extended facilities definied in the HyTime standard. There are chapters dealing with the grammatical restriction of PCDATA and CDATA content using architectures or the use of LINK for a flexible architectural mapping."
Summary: "Die Extensible Markup Language (XML), eine vereinfachte Version der Standard Generalized Markup Language (SGML), wurde für den Austausch strukturierter Datenim Internet entwickelt. Informationen können damit nicht nur in einem einheitlichen, medienunabhängigen Format strukturiert werden, sondern auch die Strukturierungsprinzipien selbst sind durch ein formales Regelwerk, eine Grammatik, beschreibbar. Erst so werden weitergehende Verarbeitungsprozesse wie geleitete Dateneingaben, Datenkonvertierung, flexibles Navigieren und Viewing der Daten möglich. Neben der elementaren Informationsmodellierung ist mit der Meta-Strukturierung durch sog. Architekturen ein neuer Aspekt hinzugekommen: dieobjektorientierte Schichtung von Struktur-Grammatiken. Das vorliegende Buch stellt sowohl elementare als auch architektonische Strukturierungstechniken erstmals in zusammenhängender Form dar. Es wendet sich an Leser, die sich detailliert und praxisorientiert mit dem Thema auseinandersetzen wollen."
Contents: "Einleitung. - Teil I. Primäre Strukturierung - Strukturgrammatiken: Elemente.- Attribute. - Dokumente. - SGML-Versionen. - Teil II. Sekundäre Strukturierung - Architekturen: Sekundäre Strukturierung durch Architekturen. - Deklaration von Architekturen. - Architektur-Definition und Link-Prozess-Deklarationen. - Anwendungen.- Anhang: A. Standardisierte Informationsmodelle. - B. XML-Syntaxregeln mit SGML-Erweiterungen. - C. Architektonische Verarbeitung in SP. - D. SGML-Deklarationen für XML.- E. Abbildungsverzeichnis. - F. Verzeichnis von Definitionen und Beispielen.- G. Register. - H. Materialien." [from Springer Verlag]
Keywords: Standard Generalized Markup Language (SGML), Extensible Markup Language (XML), HyTime, Informationsmodellierung, Textstrukturierung.
This document emerges from the author's thesis research. The "statements" in this document should be of interest and use to designers of object-oriented databases. The document is available online http://www.let.ruu.nl/departments/C+L/loeffen/phdthes/statemen.htm. [mirror copy, July 21, 1995, text only].
Available in Postscript format as "sigmod.uue" [encodes "sigmod.ps"] through anonymous FTP. Note that other valuable research papers on SGML from Arjan Loeffen are available from the same FTP server: see the subdirectory "models" and the subdirectory "sgml-model".
Abstract: "Text models focus on the manipulation of textual data. They describe texts by their structure, operations on the texts, and constraints on both structure and operations. In this article common characteristics of machine readable texts in general are outlined. Subsequently, ten text models are introduced. They are described in terms of the datatypes that they support, and the operations defined by these datatypes. Finally, the models are compared." [The text models discussed include: TDM (relational model based upon nonfirst normal form), P-string model, PAT (University of Waterloo), TOMS ("textual object management system" - an indexing toolkit), the containment model, MdF ("Monads-dot-Features"), the Banyan system, Extended MAESTRO, Grif, and Multos.
Abstract: Text models focus on the manipulation of textual data. They describe texts by their structure, operations on the texts, and constraints on both structure and operations. In this article common characteristics of machine readable texts in general are outlined. Subsequently, ten text models are introduced. They are described in terms of the datatypes that they support, and the operations defined by these datatypes. Finally, the models are compared. The models include the TDM text data model based on nonfirst normal form, p-string model, PAT text model, TOMS textual object management system and the containment model.
Abstract: "In this article I intend to show that the current mechanisms for specifying how SGML enoded documents are to be processed may not be adapted to express the intent, or meaning, of the encoding strategy applied. First, a short survey of SGML essentials is given. SGML document processing is introduced, and common approaches for specifying such processes are described. These processes concern a small application domain. Some inherent restictions of SGML and current processing techniques are discussed. Next, an object-oriented view on the document is given, and its application as a processing framework is outlined. Finally, semantic specifications are introduced, that allow for validation and processing specifications to be recorded and exchanged in the form of semantic specification sheets."
Also: in Interdiciplinaire Onderzoeksconferentie Informatiewetenschap 1996, Delft, 1996.
Available online: http://CandL.let.ruu.nl/preprint/stinfon/stinfon.htm, or in Postscript format; [mirror copy, Postscript].
Abstract: "The need for organizations and industries to increase the efficiency of using document information has lead to the development and adoption of standards for document architectures. The use of networked computers to author, exchange, manipulate, store, retrieve, present, use, and re-use information has simultaneously created the possibility and the need for adopting standards for interchanging digital document information. Structured document information systems require the attention of producers and users of information today because growing document repositories are recognized as valuable information assets. Implementing standards-conforming, structured information systems, increases the value of these document repositories, but doing so requires serious rethinking of the ways document information is produced, stored, and distributed. This Special Issue of JASIS addresses the standards of structured information and document architectures, the issues surrounding the implementation of these standards for organizations and persons working towards the goal of using document information more efficiently, and explores the future of structured document information systems."
See other details concerning the original call for papers and the significance of this special issue, appearing as the ninth in a series of several special topics issues of JASIS, following the announcement by Donald H. Kraft (April, 1992 issue of JASIS: "A Call to Action in Response to Happy Days," Editorial, Journal of the American Society for Information Science 43/3, April 1992, page 302).
Abstract [from the call for papers]: "The need for organizations and industries to increase the efficiency of using document information has lead to the development and adoption of international standards for document architectures. The use of networked computers to author, exchange, manipulate, store, retrieve, present, use, and re-use information has simultaneously created a need for and the possibility of adopting standards for interchanging digital information. Structured document information systems require the attention of producers and users of information today because growing document repositories are recognized as valuable information assets. Implementing standards-conforming, structured information systems, can increase the value of these document repositories, but doing so requires serious re-thinking of the ways information is produced and distributed. Papers are solicited on topics which will address research and development issues that will: (a) introduce the concepts underlying structured information, (b) address the evolution of the standards of document architectures and (c) address the issues surrounding the implementation of these standards for organizations and persons working towards the goal of using information more efficiently. Papers exploring the future of structured information systems are welcome."
Although research articles and empirical studies will be favored, state of the art reviews or position papers on SGML and other international standards as well as DTD's from government or industry will be considered. These might include, for instance, HTML (Internet WWW), CALS (Department of Defense), ICADD (Publishing), or TEI (Academic Community) as well as many others."
This ninth special topics issue of JASISis scheduled to appear in mid-1996. It will cover the topic of Standards for Document Architectures: SGML (Standard Generalized Markup Language), HyTime (Hypermedia/Time-based Structuring Language), DSSSL (Document Style Semantics and Specification Language), and SPDL (Standard Page Description Language). The guest editor for this special issue is Elisabeth Logan of Florida State University. This special issue appears in a sequence of several issues, following the announcement by Donald H. Kraft (April, 1992 issue of JASIS: "A Call to Action in Response to Happy Days," Editorial, Journal of the American Society for Information Science 43/3, April 1992, page 302).
The collection of articles is presented in bibliographic summary within a dedicated document. See also the individual entries: (1) Logan, Elisabeth; Pollard, Marvin. "[Special Issue Volume] Introduction." (2) Weibel, Stuart. "In Memoriam: A Tribute to Yuri Rubinsky, August 2, 1952 -- January 21, 1996." (3) Marcoux, Yves; Sévigny, Martin. "Why SGML? Why Now?" (4) Mason, James David. "SGML and Related Standards: New Directions as the Second Decade Begins." (5) Adler, Sharon C. "The ``ABCs'' of DSSSL." (6) Kimber, W. Eliot; Woods, Julia A. "Application of HyTime Hyperlinks and Finite Coordinate Spaces to Historical Writing, Analysis, and Presentation." (7) Flynn, Peter. "W[h]ither the Web? The Extension or Replacement of HTML." (8) Barnard, David T; Ide, Nancy M. "The Text Encoding Initiative: Flexible and Extensible Document Encoding." (9) Sengupta, Arijit; Dillon, Andrew. "Extending SGML to Accommodate Database Functions: A Methodological Overview." (10) Fausey, Jon; Shafer, Keith. "All My Data Is in SGML. Now What?." (11) Salminen, Airi; Kauppinen, Katri; Lehtovaara, Merja. "Towards a Methodology for Document Analysis." (12) Goldfarb, Charles F. "SGML: The Reason Why and the First Published Hint."
See: [July 1997] the online Table of Contents, [archive copy]; or "Call for Papers" - http://www.asis.org/Publications/JASIS/structure.html; [mirror copy].
The article explains the use of the PAT retrieval program, which scans the OED text using the descriptive tags in the dictionary, in conjunction with GOEDEL (Generalized OED Extracting Language). Both PAT and GOEDEL were developed at the University of Waterloo Centre for the NOED, and are being used as generalized retrieval software facilities at other institutions. Tables III and IV provide examples of the SGML-style tagged text of the New OED.
This issue of
"Abstract: Document processing systems help organizations get a handle on their masses of paper. A Gartner Group study estimates that professionals spend as much as half their time searching for documents, but only 15 percent of their time reading them. Document management comprises such technologies as workflow, full text retrieval and RDBMS. Though its roots are in image processing, the latest document management approach is to treat documents as dynamic information objects that drive enterprise decision-making and business processes, rather than simply text and pictures on a page. A successful document management system will metamorphize legacy information into a knowledge repository. Organizations should only choose solutions that conform to international standards, such as Standard Generalized Markup Language (SGML) and the Continuous Acquisition and Life-cycle Support (CALS), developed by the US Defense Dept.
Résumé: Bien que saisi sous une forme électronique, le rapp ort d'activité de l'Inria n'a v ait jamaisété traité sous cette forme. Cet article décrit la procédure mise en place, s'appuyant sur la norme SGML, pour exploiter par divers vecteurs (www, Minitel, ftp,...) l'important volume d'information contenu dans ce rapport. Nous évoquerons les problèmes rencontrés, les apports de ce nouveau système et concluerons sur les perspectives ouvertes par ce processus.
Abstract: Each year, Inria produces an activity report. Although this report is typeset in an electronic form, it was never exploited in this way. This paper describes a new process, based on SGML, which allows users to access to the report by different ways (WWW, Minitel, ftp,...). Advantages and disadvantages of this process will be shown and future developments will be presented."
Available on the Internet in Postscript format: ftp://ftp.irisa.fr/opera/doc/ra.ps.gz [mirrored copy, November 1995].
Abstract: "A strategy for document analysis is presented which uses Portable Document Format (PDF - the underlying file structure for Adobe Acrobat software) as its starting point. This strategy examines the appearance and geometric position of text and image blocks distributed over an entire document. A blackboard system is used to tag the blocks as a first stage in deducing the fundamental relationships existing between them. PDF is shown to be a useful intermediate stage in the bottom-up analysis of document structure. Its information on line spacing and font usage gives important clues in bridging the 'semantic gap' between the scanned bitmap page and its fully analysed, block-structured form. Analysis of PDF can yield not only accurate page decomposition but also sufficient document information for the later stages of structural analysis and document understanding."
For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
"Abstract: Most documents have a hierarchical structure, which can be made explicit by markup languages such as SGML. W e propose a formal model for representation of hierarchically structured documents, to be used as the basis for document query languages. The model uses a redundant representation of the document elements to simplify the expression of common queries. As an illustration of the power of the model we show how queries might be expressed, both as set theoretic expressions and in a simple algebra, and outline how queries might be evaluated in a practical system."
See also: the related published paper.
"Abstract: Documents have a natural hierarchical structure, implicit in most texts and made explicit by markup languages such as SGML. In this paper, we propose a formal model for representation of hierarchically structured documents, to be used as the basis for document query languages. The model uses a redundant representation of the document elements to simplify the expression of common queries. As an illustration of the power of the model, we show how queries might be expressed and outline how such queries might be evaluated in a practical system."
See also the published abstract, or Justin Zobel's Home Page.
Abstract: "The Application Protocol Information Base (APIB) is an on-line repository of documents for the Standard for the Exchange of Product model data (STEP, officially ISO 10303-Product Data Representation and Exchange). STEP Application Protocols are standards that are intended to be implemented in software systems, and Integrated Resources are used by them as building blocks. Application Protocols and Integrated Resources are represented in the Standard Generalized Markup Language (SGML) in the APIB in order to facilitate efficient information search and retrieval. This paper describes a World Wide Web gateway to the APIB, implemented using the Common Gateway Interface (CGI) standard. The APIB gateway allows STEP developers to efficiently search for ISO 10303 standards and supporting information. The only client software required to use the APIB gateway is a third party web browser."
Available on the Internet in Postscript format; URL: http://www.nist.gov/msidlibrary/doc/lubel96a.ps; or HTML; [Postscript mirror copy]. See the main entry for SGML and STEPSTEP for further information.
Abstract: "Businesses and organizations are increasingly finding that HTML (Hyper-Text Markup Language) offers no help whatsoever in managing the information on their web sites. SGML (Standard Generalized Markup Language) provides the flexibility and reuse lacking in HTML. However, SGML alone does not address the problems involved in maintaining on-line document repositories. Although traditional database management systems are clumsy at managing hyperlinked documents, a system combining SGML, database technology, and the protocols of the Web can provide a reasonably robust environment for developing and maintaining a web site. Two possible site designs employing SGML are discussed and evaluated with respect to a set of design objectives and choices."
"The emerging XML standard promises to provide web site developers with the best of both worlds, allowing them to enjoy most of the benefits of SGML while not sacrificing the convenience of HTML and interoperability with the rest of the Web. If XML is ultimately successful, not only will it be easier for web site developers to use SGML, but also they will be able to build applets embedding capabilities supporting the manipulation of SGML data in web clients. This will reduce the burden on the server and, more importantly, will open a new world of possibilities for interaction between SGML repositories and other databases and applications. More information about the work discussed in this paper is available at http://www.nist.gov/apde/ [NIST, Application Protocol Development Environment on the Web, Home Page]".
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
A version of the document is available online in HTML format: "SGML on the Web: A Tale of Two Sites"; [local archive copy]
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Developing SGML applications involves making choices driven by end user requirements and by the availability and functionality of third party SGML parsers, authoring tools, search engines, browsers, and data converters. Capabilities of HTML and the World Wide Web should factor into these decisions as well if users are geographically dispersed or have diverse computing platforms. SGML application developers typically build some or all of the following components: a DTD; legacy data conversion tools; a DTD-tailored authoring environment; a document repository; browsing and searching interfaces; and tools for producing formatted output. For each component, we discuss design and implementation alternatives, the approach we decided to use in building our SGML environment for authoring and accessing STEP product data exchange standards, and our rationale for choosing that approach."
More informtion on SGML and STEP (ISO 10303 Standard for the Exchange of Product Data) is available in the dedicated entry of the SGML/XML Web Page.
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "Businesses and organizations are increasingly finding that HTML (Hyper-Text Markup Language) offers no help whatsoever in managing the information on their web sites. SGML (Standard Generalized Markup Language) provides the flexibility and reuse lacking in HTML. However, SGML alone does not address the problems involved in maintaining on-line document repositories. Although traditional database management systems are clumsy at managing hyperlinked documents, a system combining SGML, database technology, and the protocols of the Web can provide a reasonably robust environment for developing and maintaining a web site. Two possible site designs employing SGML are discussed and evaluated with respect to a set of design objectives and choices."
"The emerging XML standard promises to provide web site developers with the best of both worlds, allowing them to enjoy most of the benefits of SGML while not sacrificing the convenience of HTML and interoperability with the rest of the Web. If XML is ultimately successful, not only will it be easier for web site developers to use SGML, but also they will be able to build applets embedding capabilities supporting the manipulation of SGML data in web clients [Bos97]. This will reduce the burden on the server and, more importantly, will open a new world of possibilities for interaction between SGML repositories and other databases and applications."
The document is available in HTML format: http://www.mel.nist.gov/div826/subject/apde/papers/sgml97.htm; [local archive copy].
The purpose of the article is to provide basic orientation to SGML and tips about using it for managing technical documentation.