The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: October 19, 2001
SGML/XML Bibliography Part 1, A - B

Aberer, Karl; Böhm, Klemens; Hüser, Christoph. "The Prospects of Publishing Using Advanced Database Concepts." Electronic Publishing: Origination, Dissemination and Design (EPODD) 6/4 (December 1993) 469-480. ISSN: 0894-3982. Authors' affiliation: GMD-IPSI [Integrated Publication & Information Systems Institute]; Dolivostrasse 15, D-64293 Darmstadt, Germany; Email contact: kboehm@darmstadt.gmd.de.

"Abstract: Publishing is a distributed process. It is characterized by the cooperation of different experts. The approach of the Integrated Publication and Information Systems Institute (IPSI) to support electronic publishing is to build an integrated publication environment. The publication of electronic documents demands enhanced support from publishing tools and imposes new challenges on database technology. Taking a hypermedia reference publication -- the Dictionary of Art (DofA) -- as an example of an innovative hypermedia dicument, requirements on database technology for the production of electronic publications are discussed. Those can be met by using an object-oriented database management system like VODAK. We present an efficient, flexible and application-independent database application for structured document handling (D-STREAT). Our focus is on dynamic Document Type Definition management."

Available in Postscript format as P-93-23.ps.Z from the GMD-IPSI FTP server. [Mirror copy, October 1995)]



[CR: 19980330]

ACH/ACL/ALLC (Association for Computers and the Humanities, Association for Computational Linguistics, Association for Literary and Linguistic Computing). Guidelines for Electronic Text Encoding and Interchange (TEI P3). Edited by C.M. Sperberg McQueen and Lou Burnard. Chicago: ACH/ACL/ALLC, April 8 1994. 2 volumes, xxvi + 1290 pages.

For an overview of the TEI's SGML application, see the related section on TEI. The published TEI Guidelines are available as an electronic book, prepared under Electronic Book Technologies' DynaText SGML browser. In this DynaText format, the material is fully searchable, with hypertext links and other navigation tools. See the bibliographic reference or announcementfor additional details. See below and a separate ordering information form for availability of the 2 volumes in paper and on the Internet. Lists of FTP sites and WWW sites for accessing the P3 Guidelines online are supplied elsewhere in this SGML database.

For an excellent general introduction to SGML, see Chapter 2 of the TEI Guidelines (pages 13-36): "A Gentle Introduction to SGML", edited by C. M. Sperberg-McQueen and Lou Burnard. Chapter 2 supplies a broad introduction to SGML, but the remainder of the two volumes will be of interest to anyone planning to implement SGML for analysis of literary and linguistic data. For online hypertext versions of Chapter 2, see overview section. The SGML introduction chapter 2 is also available along with the other chapters via anonymous-FTP from various sources on the Internet where the TEI P3 documents are archived. Official HTML version for the 'Gentle Introduction to SGML': http://sable.ox.ac.uk/ota/teip3sg/, or from HTI. Other locations, for example: the SGML Project at Exeter ftp://info.ex.ac.uk/tei/p3/doc/p3sg.doc, or ftp://ftp-tei.uic.edu/pub/tei/doc/p3sg.doc, or from the SGML Repository ftp://ftp.ifi.uio.no/pub/SGML/TEI/P3SG.DOC. Using mail-based access, send a message to listserv@uicvm.uic.edu with the message line: get P3SG DOC; for the listing, send the message line: index tei-l; for the entire set of P3 files: get P3ALL $PACKAGE. The introduction was translated into Russian by Boris Tobotras: HTML or SGML format, [local archive copy].



[CR: 19950716]

"ACM to Participate in NSF Digital Libraries Grant." Communications of the Association for Computing Machinery, ACM MemberNet [Supplement] 37/11 (November 1994) 7-8.

"Abstract: Following Stanford University's Department of Computer Science's receipt of a National Science Foundation grant for digital library development, ACM was invited to joint the Stanford Integrated Digital Library Project proposal. ACM began its own Electronic Publishing Program in an effort to transform the organization's traditional print publishing program into one that provides electronic access to a scientific and technical information resource. As part of the first phase of the project, a publications production data base will be installed and operating as an SGML application within the next 6 to 9 months. During the 2nd phase, electronic distribution and access will be addressed. The Stanford Project's goal is to develop the technologies that enable users to interact with a single universal virtual library that is composed of large numbers of distributed and heterogeneous repositories."

For more information, contact Bernard Rous, Deputy Director of Publications at ACM Headquarters. Email: rousacm.org; Tel: 212-626-0660.



Adams, Charlotte. "SGML Broadens Appeal from DOD Base." Federal Computer Week 8/35 (December 5, 1994) 28-29.

"Abstract: Standard Generalized Markup Language (SGML) is gaining popularity in the Pentagon and other federal agencies as a way to give order to the government's vast information holdings. First popularized by the Pentagon's now-troubled Continuous Acquisition and Life-Cycle Support (CALS) program, SGML has been enthusiastically embraced by industry and governments worldwide. The International Standards Organization has recognized the SGML specification. In 1993, SGML vendors sold more than $779 million in goods and services, including $125 million to the US government. SGML essentially gives text the functionality of an updateable database by tagging key words that can be re-used and updated in later revisions. Data need only be keyed in once. SGML also slashes the costs of publishing and reissuing big documents."



[CR: 19951226]

Adams, Ellen. "Using SGML in Electronic Catalog Development." In Proceedings of the Second SGML BeLux Users' Conference. SGML BeLux '95: Second annual conference on the practical use of SGML, Antwerp, Belgium. October 25, 1995. Edited by Hans C. Arents. Leuven, Belgium: Katholieke Universiteit Leuven, 1995. Author's affiliation: IBM Corporation, NAS Division, Mail Stop 7J08, Thornwood Conference Center, 500 Columbus Ave., Thornwood, NY 10594 USA. Email: ellena@vnet.ibm.com.

"Abstract: Since every aspect of the business process is under continuous scrutiny, and remaining competitive in today's global marketplace demands that we redefine the ways in which we do business, IBM has pioneered the Electronic Purchasing Service, founded on the cornerstone of SGML.

"The Electronic Purchasing Service (EPS) was designed as a way to provide IBM's vendor customers with a way of reducing costs and improving control, while providing a better level of service to their own customer bases. The Electronic Purchasing Service is an advanced, network-based sales and procurement solution that allows end users to locate, compare and purchase items directly through electronic catalogs. With its preeminence as a electronic document exchange standard and its focus on reusability, SGML was deemed ideal for this electronic commerce application. Therefore, IBM Thornwood has used the Standard Generalized Markup Language to develop an electronic catalog application.

"The presentation will cover the four basic tasks or stages in the implementation of the Electronic Purchasing Service application: (1) planning the Electronic Purchasing Service application; (2) capturing the data in SGML format; (3) managing the information with SGML tools; (4) putting it to work. This session will briefly describe the most important functions in each of these stages, and the kinds of tools IBM used in performing them."

The document is available online in HTML format: "Using SGML in Electronic Catalog Development" [mirror copy, text only, December 1995]. For further details on the 1995 Conference and BeLux, see the contact information for SGML BeLux.



[CR: 19951220]

Adams, R. J. "Electronic Libraries SGML Applications: Background to Project ELSA." Program - Automated Library and Information Systems 29/4 (October 1995) 397-406 (with 6 references). Author's affiliation: De Monfort University, Leicester LE1 9BH, Leicester, England.

"Abstract: Project ELSA is examining the use of documents encoded in SGML (Standard Generalized Markup Language) for the delivery of information to library end users and to librarians acting as information intermediaries. A partnership of industry providers and an end user working together within the Third Framework of the CEC Libraries programme, is constructing a delivery system using SGML encoded journal articles which will be used to investigate technical issues and to examine the potential for offering new and improved services. The development of document delivery is discussed briefly followed by some background on SGML and comment on progress within the ELSA project. Some possible applications of such a system are discussed."

See also the main entry for ELSA.



[CR: 19970817]

Adler, Sharon C. "The ``ABCs'' of DSSSL." Pages 597-602 in Structured Information/Standards for Document Architectures. Edited by Elisabeth Logan and Marvin Pollard. = Journal of the American Society for Information Science, Special Issue. Volume 48, Number 7 (July 1997). New York: John Wiley & Sons Inc., 1997. ISSN: 0002-8231. Author's affiliation: Inso Corporation, One Richmond Square, Providence, RI 02818; Email: sca@eps.inso.com.

Abstract: "DSSSL, the Document Style Semantics and Specification Language, is ISO/IEC 10179, an International Standard for the formatting and other processing of SGML documents. DSSSL was completed in January 1996 after eight (8) years of development. From its inception, DSSSL was conceived as a companion standard to SGML, where SGML is a language for standardizing the way we represent document structures without regard to form or presentation. It is possible to use SGML markup to represent formatting information, but this is discouraged, since doing so makes a document more difficult to reuse and reprocess. Reuse is generally a significant requirement for SGML data so it is not a good idea to 'pollute' your document with presentational markup. Yet formatting of some nature is still desirable, and sometimes critical, for all documents, and in some cases users want to interchange this formatting information (informally known in the industry as style sheets) in a standardized, non-proprietary format. DSSSL is key to enabling this interchange."

See also Anders Berglund and Sharon Adler ("ABCs of DSSSL") in the Conference Proceedings of SGML '95.

See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.



Adler, Sharon C. "The Birth of a Standard (SGML)." Journal of the American Society for Information Science 43/8 (September 1992) 556-558. (4) references. ISSN: 0002-8231. Author affiliation: IBM Corporation, Boulder, CO.

Abstract: The Standard Generalized Markup Language (SGML) was adopted as an international standard for data description, data modeling, and interchange in October 1986. This article explores the evolution of the standard following its technical completion and leading to widespread market acceptance.



Adler, Sharon C. "DSSSL- Document Style Semantics and Specification Language." <TAG> 1/8 (January 1989) 8-9.

An overview of goals of the standard by one of the editors of DSSSL. For brief description of DSSSL, see the entry below on this Draft International Standard (ISO/IEC DIS 10179). Note: the design of the DSSSL specification changed several times preceding its approval as an ISO standard.



[CR: 19970620]

Adler, Sharon C. "Thoughts on the Tenth Anniversary of <TAG>." <TAG>: The SGML Newsletter 10/6 (June 1997) 8. ISSN: 1067-9197. Authors' affiliation: Inso Corporation.

Sharon Adler was a co-founder and editor of <TAG> in 1987. This short article offers some reflections upon SGML in the past decade.



Ahearn, Hally. "SGML and the New Yorker Magazine." Technical Communication: Journal of the Society for Technical Communication 40/2 (Second Quarter, May 1993) 226-229. ISSN: 0049-3155. Author affiliation: Oster & Associates, Inc.

Five SGML developers created five different document type definitions (DTDs) for the literary magazine, The New Yorker, as an exercise presented at SGML '92. The first developer tried, without success, to use the existing format of the American Association of Publishers. The second used content-specific tagging. The third DTD allowed format-for-print in SGML markup. The fourth DTD supported hypertext output. The fifth DTD was designed to support a historical database of articles. The author, who developed the fifth DTD, concludes that in a real application, as opposed to an exercise, elements of each DTD would come to play. Although it introduces several technical terms, the article illustrates the richness and complexity of DTD development.



[CR: 19970523]

Ahonen, Helena. "Automatic Generation of SGML Content Models." Pages 195-206 (with 12 references) in EP '96. Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation and Typography. [ = Journal Special Issue: Electronic Publishing - Origination, Dissemination and Design (EPODD), June & September 1995, Volume 8, Issues 2-3. Sixth International Conference on Electronic Publishing, Document Manipulation and Typography, Palo Alto, California. September 24-26, 1996. Sponsored by Adobe Systems Incorporated; School of Information Management and Systems, University of California at Berkeley; Xerox Corporation. [Proceedings Volume] Edited by Allen Brown, Anne Brüggemann-Klein, and An Feng; [Journal] Editors David F. Brailsford and Richard K. Furuta. Chichester/ New York: John Wiley & Sons, 1996. ISSN: 0894-3982. Author's affiliation: Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. Phone: +358 0 708 44218; Fax: +358 0 708 44441; Email: helena.ahonen@helsinki.fi. WWW: Helena Ahonen Home Page.

Abstract: "We study the problem of automatic generation of a document type definition (DTD) for a set of Standard Generalized Markup Language (SGML) documents. We present various situations where we have tagged documents but no DTD, and discuss the requirements various applications may have with respect to the generation process. We also present an automatic DTD generation tool that can be adjusted for several tasks necessary in the applications. The method is also demonstrated with some experimental cases."

Keywords: SGML, document type definition, generation, TEKES.

For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.

The document is available in Postscript format: http://www.cs.helsinki.fi/~hahonen/helena_ep96.ps [mirror copy].



[CR: 19960728]

Ahonen, Helena. Automatic Generation of SGML Content Models. Paper Submitted and accepted for presentation at Electronic Publishing '96. Helsinki, Finland: Department of Computer Science, University of Helsinki, Finland, 1996. Extent: 10 pages. Author's affiliation: Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. Phone: +358 0 708 44218; Fax: +358 0 708 44441; Email: helena.ahonen@helsinki.fi. WWW: Helena Ahonen Home Page.

Abstract: "We study the problem of automatic generation of a document type definition (DTD) for a set of Standard Generalized Markup Language (SGML) documents. We present various situations where we have tagged documents but no DTD, and discuss the requirements various applications may have with respect to the generation process. We also present an automatic DTD generation tool that can be adjusted for several tasks necessary in the applications. The method is also demonstrated with some experimental cases."

The document is available on the Internet: http://www.cs.helsinki.fi/~hahonen/helena_ep96.ps; [mirror copy]



[CR: 19960728]

Ahonen, Helena. Disambiguation of SGML Content Models. Paper Submitted and accepted for presentation at Principles of Document Processing '96. Helsinki, Finland: Department of Computer Science, University of Helsinki, Finland, 1996. Extent: approximately 10 pages, with 5 references. Author's affiliation: Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. Phone: +358 0 708 44218; Fax: +358 0 708 44441; Email: helena.ahonen@helsinki.fi. WWW: Helena Ahonen Home Page.

Abstract: "A Standard Generalized Markup Language (SGML) document has a document type definition (DTD) that specifies the allowed structures for the document. The basic components of a DTD are element declarations that contain for each element a content model, i.e., a regular expression that defines the allowed content for this element. The SGML standard requires that the content models of element declarations are unambiguous in the following sense: a content model is ambiguous if an element or character string occurring in the document instance can satisfy more than one primitive token in the content model without look-ahead. Brüggemann-Klein and Wood have studied the unambiguity of content models, and they have presented an algorithm that decides whether a content model is unambiguous. In this paper we present a disambiguation algorithm that, based on the work of Brüggemann-Klein and Wood, transform an ambiguous content model in to an unambiguous one by generalizing the language. We also present some experimental results obtained by our implementation of the algorithm in connection to an automatic DTD generation tool."

The document is available on the Internet: http://www.cs.helsinki.fi/~hahonen/ahonen_podp96.ps; [mirror copy].



[CR: 19971206]

Ahonen, Helena. "Disambiguation of SGML Content Models." Pages 27-37 (with 6 references) in Principles of Document Processing. Proceedings of the Third International Workshop. PODP '96, Third International Workshop. Palo Alto, California. September 23, 1996.. Edited by Charles Nicholas (Department of Computer Science and Electrical Engineering, UMBC, Baltimore, MD) and Derick Wood (Department of Computer Science, HKUST, Clear Water Bay, Kowloon, HONG KONG). Lecture notes in artificial intelligence. Lecture notes in computer science, 1293. Berlin / London: Springer-Verlag, 1997. ISBN: 354063620X. Author's affiliation: Department of Computer Science, Helsinki University, Finland.

Abstract: "A Standard Generalized Markup Language (SGML) document has a document type definition (DTD) that specifies the allowed structures for the document. The basic components of a DTD are element declarations that contain for each element a content model, i.e., a regular expression that defines the allowed content for this element. The SGML standard requires that the content models of element declarations are unambiguous in the following sense: a content model is ambiguous if an element or character string occurring in the document instance can satisfy more than one primitive token in the content model without look-ahead. A. Bruggemann-Klein and D. Wood (1992; 1994) have studied the unambiguity of content models, and they presented an algorithm that decides whether a content model is unambiguous. We present a disambiguation algorithm that, based on the work of Bruggemann-Klein and Wood, transforms an ambiguous content model into an unambiguous one by generalizing the language. We also present some experimental results obtained by our implementation of the algorithm in connection to an automatic DTD generation tool."



[CR: 19970523]

Ahonen, Helena; Heikkinen, Barbara; Heinonen, Oskari; Jaakkola, Jani; Kilpeläinen, Pekka; Lindén, Greger; Mannila, Heikki. Constructing Tailored SGML Documents. Technical Report, University of Helsinki, Department of Computer Science. Helsinki, Finland: University of Helsinki, August 20, 1996. Extent: 9 pages, with 14 references. ISSN: . Authors' affiliation: Department of Computer Science, University of Helsinki. WWW: http://www.cs.helsinki.fi/research/rati/..

Abstract: "A tailored document corresponds to the need of a certain user group or user task. Tailored documents may be constructed through document assembly from a pool of documents. An intelligent document contains information that supports this assembly. We suggest different kinds of information that may be associated with an intelligent document, and used in the assembly process. We also study the assembly process itself, and the transformations needed to form the tailored document from fragments of documents. We report on three case studies, where intelligent document assembly methods are being prototyped on commercially used document material. As a basis for the project we consider documents marked up with SGML (Standard Generalized Markup Language)."

Available online in Postscript format: http://www.cs.helsinki.fi/research/rati/sidnew.ps; [mirror copy]. See further the main entry for the University of Helsinki - Document Management Research Group.



[CR: 19980907]

Ahonen, Helena; Heikkinen, Barbara; Heinonen, Oskari; Kilpeläinen, Pekka; . "Assembling Documents from Digital Libraries." Pages 419-429 (with 11 references) in Proceedings of the 8th International Conference on Database and Expert Systems Applications. DEXA '97 - International Conference on Database and Expert Systems Applications. Toulouse, France. September 1-5, 1997. Edited by Abdelkader Hameurlain and A Min Tjoa. Lecture notes in computer science, Number 1308. New York / Berlin: Springer-Verlag, 1997. ISBN: 3540634789. Authors' affiliation: Department of Computer Science , University of Helsinki, Finland.

Abstract: "We consider assembling documents using, as a source, a digital library containing SGML documents. The assembly process contains two parts: 1) finding interesting fragments, and 2) constructing a coherent document. We present a general document assembly framework. First, we describe a system for tailoring control engineering textbooks. Its assembling facilities are rather restricted but, on the other hand, the quality of documents produced is high. Second, we address the problem of filtering and combining interesting information from a large heterogeneous document collection. The methods presented offer various ways to find the interesting document fragments. Moreover, the elements found in the fragments are mapped to generic elements, like sections, paragraph containers, paragraphs and strings, which have known semantics. Hence, even arbitrary compositions can be formatted and printed."

Available online in Postscript format; local archive copy.



[CR: 19970524]

Ahonen, Helena; Heikkinen, Barbara; Heinonen, Oskari; Klemettinen, Mika. Improving the accessibility of SGML documents - A content-analytical approach. University of Helsinki, Department of Computer Science Technical Report. Helsinki, Finland: Department of Computer Science, University of Helsinki, May 1997. Extent: 9 pages, with 8 references.

Abstract: "Document retrieval based on string searches typically returns either the whole document or just the occurrences of the searched words. What the user often is after, however, is a microdocument: a part of the document that contains the occurrences and is reasonably self-contained. These microocuments might, for instance, consist of several successive text paragraphs sharing a mutual subject. Single paragraphs, or corresponding close-to-leaf SGML elements, do not convey enough of the contextual information. On the other hand, sections or subsections of a text document, such as a book or an article, can discuss many heterogeneous topics, and thus be too large a unit for retriev al or assembly.

"We claim that such microdocuments are both suitable retrievable units and appropriate units for document assembly, and that they can be reasonably well located using automatic techniques. Optimal creation of microdocuments would require thorough semantic analysis of the text. However, it is possible to catch parts of the elementary semantic content by statistical term-frequency analysis. Term-frequency distributions enable us to determine the locations of possible topic changes in the text. Based on this information, we can measure the similarity of two successive elements, and decide whether we wish to have them in the same micro document. On the other hand, existing markup, for example classifying attributes, can be used in boundary detection.

The microdocument, again, can be attributed with content information. The results of our preliminary experiments show that the presented approach works well in user-assisted topic-oriented microdocument detection. We currently study the usefulness of this technique in document assembly, i.e., in generating new documents from a collection of existing text documents."

Also To appear in the proceedings of SGML Europe '97, Barcelona, Spain, May 1997. GCA.

The document is available online in Postscript format: http://www.cs.helsinki.fi/%7Eoheinone/publications/Improving_the_Accessibility_of_SGML_Documents_-_A_Content-analytical_Approach.ps.gz [mirror copy].



[CR: 19980423]

Ahonen, Helena; Heikkinen, Barbara; Heinonen, Oskari; Kilpeläinen, Pekka. A system for assembling specialized textbooks from a pool of documents. University of Helsinki, Department of Computer Science, Publications Series C, No. C-1997-22. : , March 1997. Extent: 9 pages (with 9 references). Authors' affiliation: University of Helsinki.

Summary: "We consider assembling specialized, customized textbooks from a large collection of SGML documents. . . In addition, we describe our experience in converting MS Word documents into tagged SGML format by presenting both the conversion architecture and lessons learned."

The document is available online in Postscript format: see http://www.cs.helsinki.fi/research/rati/sid.html. [mirror copy].

See further the main entry for the University of Helsinki - Document Management Research Group. See also: J. Jaakkola, P. Kilpeläinen and G. Lindén: "TranSID - An SGML transformation language." To appear in The Fifth Symposium on Programming Languages and Software Tools, Jyväskylä, Finland, June 1997. Available as Department of Computer Science Report C-1997-36, University of Helsinki, May 1997.



[CR: 19970523]

Ahonen, Helena; Heikkinen, Barbara; Heinonen, Oskari; Jaakkola, Jani; Kilpeläinen, Pekka; Lindén, Greger; Mannila, Heikki. Intelligent Assembly of Structured Documents. University of Helsinki, Department of Computer Science, Publications Series C, No. C-1996-40. Helsinki: University of Helsinki, Department of Computer Science, June 1996. Extent: 15 pages, 26 references. Author's affiliation: Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland.

Abstract: "An intelligent document contains information about its structure, its contents and its environment. This information supports intelligent document assembly, i.e., the effective reuse of existing documentation to produce new documents adapted to specific needs. We suggest different kinds of information that may be associated with an intelligent document, and used in the assembly process. We also study the assembly process itself, and the transformations needed to form the assembled document from fragments of documents. We report on three case studies, where intelligent document assembly methods are being prototyped on commercially used document material."

[. . . ] "Sometimes fragments must be transformed to conform with a certain DTD; sometimes the assembly itself requires that the retrieved document is pruned of certain parts, augmented with others, or that components pulled out from different parts are combined together. In order to satisfy these needs, we have designed a new declarative, simple and powerful transformation language for SGML documents, called TranSID. The TranSID language gives the user access to the entire SGML document tree, not only to stream of start and end events of structural components, which is common in most other SGML transformation languages. The use of the language is based on describing the substructures of the SGML instance that are to be replaced by other substructures. The transformation process corresponds to the tree transformation process of the forthcoming DSSSL standard, but the transformation description is given on a higher level of abstraction." [extracted]

Keywords: document assembly, structured documents, SGML.

Available in Postscript format on the Internet: ftp://ftp.cs.Helsinki.FI/pub/Reports/by_Project/DocMan/Intelligent_Assembly_of_Structured_Documents.ps.gz ; [mirror copy]. See further the main entry for the University of Helsinki - Document Management Research Group.



[CR: 19980907]

Ahonen, Helena; Heikkinen, Barbara; Jaakkola, Jani; Kilpeläinen, Pekka; Lindén, Greger. "Design and Implementation of a Document Assembly Workbench." Pages 476-486 (with 17 references) in Electronic Publishing, Artistic Imaging, and Digital Typography. Proceedings of the 7th International Conference on Electronic Publishing (EP '98), Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, RIDT '98). EP '98 and RIDT '98, Saint Malo, France. March 30 - April 3, 1998. Edited by Roger D. Hersch, Jacques André, and Heather Brown. Lecture Notes in Computer Science Series, Number 1375. New York/Berlin/Heidelberg: Springer-Verlag, 1998. ISBN: 3-540-64298-6, and 3-540-64298-6. Authors' affiliation: Wilhelm-Schickard-Insttut für Informatik, Sand 13, D-72076 Tübingen University, Germany. WWW [Ahonen] Helena Ahonen.

Abstract: "Computers support the management of large collections of text documents, but efficient reuse of document collections for producing new documents remains inherently difficult. We describe and discuss the design and implementation of a document assembly system based on a document assembly model, where the user produces new specialized documents by querying and browsing a collection of structured document fragments."

"We describe a document assembly model and architecture, which we have developed to study principles and methods of intelligent document reuse. The work is done in an ongoing research and development project called "Structured and Intelligent Documents (SID)". Document assembly is a central goal application of the project. As the basis for the project we consider structured documents marked up using SGML. We have developed a document assembly framework based on versatile recognizing and manipulating of document fragments, which are consistent and relatively independent document parts used as the basis for new assemblies. Similar ideas recur at document manipulation and IR meetings under different names, such as passages, semantic fragments, information units, minimal revisable units, or micro documents. The assembly framework is not inherently SGML specific, but the possibility of using generic semantic document markup and the existence of tools for a standardized formalism makes the implementation of a document assembly system based on the fragment framework much easier. [Conclusion]: We have presented a model and an implementation of a system for intelligent document assembly. The system uses a database of SGML documents from which the user assembles new documents. The assembly is based on document fragments: the user chooses among document parts and selects appropriate fragments to be included in a new document. The assembly system supports browsing and reorganizing of the fragments as well as some more sophisticated techniques such as cluster-based browsing and structured search. The system is in a prototype phase and we are just beginning to evaluate the usefulness of our assembly model. We expect that further prototyping with document assemblies will reveal some real challenges, like managing explicit and implicit dependencies between document fragments."

Available online: Slides: Design and Implementation of a Document Assembly Workbench". See also the online abstract and the full text in PDF format, local archive copy. See also this document's reference list for related publications.



[CR: 19971123]

Ahonen, Helena; Heinonen, Oskari; Heikkinen, Barbara; Klemettinen, Mika. "Improving the Accessibility of SGML Documents: A Content-Analytical Approach." Page(s) 321-327 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Authors' affiliation: Department of Computer Science, University of Helsinki, P.O.BOX 26 (TEOLLISUUSKATU 23), FIN-00014.

Abstract: "Document retrieval based on string searches typically returns either the whole document or just the occurrences of the searched words. What the user often is after, however, is microdocument: a part of the document that contains the occurrences and is reasonably self-contained."

"These microdocuments might, for instance, consist of several successive text paragraphs sharing a mutual subject. Single paragraphs, or corresponding close-to-leaf SGML elements, do not convey enough of the contextual information. On the other hand, sections or subsections of a text document, such as a book or an article, can discuss many heterogeneous topics, and thus be too large a unit for retrieval or assembly.

"We claim that such microdocuments are both suitable retrievable units and appropriate units for document assembly, and that they can be reasonably well located using automatic techniques.

"Optimal creation of microdocuments would require thorough semantic analysis of the text. However, it is possible to catch parts of the elementary semantic content by statistical term-frequency analysis.

"Term-frequency distributions enable us to determine the locations of possible topic changes in the text. Based on this information, we can measure the similarity of two successive elements, and decide whether we wish to have them in the same microdocument. On the other hand, existing markup, for example classifying attributes, can be used in boundary detection. The microdocument, again, can be attributed with content information.

"The results of our preliminary experiments show that the presented approach works well in user-assisted topic-oriented microdocument detection. We currently study the usefulness of this technique in document assembly, i.e., in generating new documents from a collection of existing text documents."

[...] "We consider a topical microdocument to be semantically motivated by the topic the microdocument discusses. Topical microdocuments might, for instance, consist of several successive text paragraphs. Single paragraphs, or corresponding close-to-leaf SGML elements, do not convey enough of the contextual information. On the other hand, sections or subsections of a text document, such as a book or an article, can discuss many heterogeneous topics. Furthermore, sections are often longer than desired with respect to the intended purpose, such as document retrieval or assembly.

In this article, we presented a method for detecting microdocuments based on term-frequency distributions. The detection process has two phases: similarity calculation and fragmentation. In general, the results of our preliminary experiments show that the presented approach works well in user-assisted topic-oriented microdocument detection. We currently study the usefulness of this technique in document assembly, i.e., in generating new documents from a collection of existing text documents.

A version of the document is available online in Postscript format: from Helsinki, or the local mirror copy. A number of related publications from the University of Helsinki are listed in a departmental bibliography.

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.



[CR: 19951220]

Ahonen, Helena; Mannila, H. ; Nikunen, Erja. "Generating Grammars for SGML Tagged Texts Lacking DTD." Pages [???-???] in Principles of Documents Processing, PODP '94. Principles of Documents Processing. Darmstadt. April 11-12, 1994. Sponsored by: Fuji Xerox Systems and Commnunications Lab, GMD-IPSI, Rank Xerox Research Centre, and Xerox Webster Research Center. Edited by Makoto Murata and Herve Gallaire. [pub-location: Darmstadt?]: [publisher: GMD-IPSI?], 1994. Authors' affiliation: [Ahonen, Mannila] Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. Phone: +358 0 708 44218; Fax: +358 0 708 44441; Email: helena.ahonen@helsinki.fi. WWW: Helena Ahonen Home Page; [Nikunen] Research Centre for Domestic Languages.

"Abstract: We describe a technique for forming a context free grammar for a document that has some kind of tagging -- structural or typographical -- but no concise description of the structure is available. The technique is based on ideas from machine learning. It forms first a set of finite-state automata describing the document completely. These automata are modified by considering certain context conditions; the modifications correspond to generalizing the underlying languages. Finally, the automata are converted into regular expressions, which are then used to construct the grammar. An alternative representation, characteristic k-grams, is also introduced. Additionally, the paper describes some interactive operations necessary for generating a grammar for a large and complicated document."

Available online: http://www.cs.helsinki.fi/~hahonen/ahonen_podp94.ps [mirror copy, December 1995]. The paper is also to appear in Mathematical and Computer Modelling. See the first author's home page for more up-to-date bibliographic details and other SGML-related research.



[CR: 19951220]

Ahonen, Helena; Nikunen, Erja. "Forming Grammars for Structured Documents: An Application of Grammatical Inference." Pages 153-167 in Grammatical Inference and Applications. Papers Presented During the Second International Colloquium. Second International Colloquium on Grammatical Inference - ICGI-94. Alicante, Spain, September 21-23, 1994. Edited by Rafael C. Carrasco and Jose Oncina. Lecture notes in computer science, number 862. Berlin/New York: Springer-Verlag, 1994. ISBN: 3540584730 (Berlin); 0387584730 (New York). ISSN: 0302-9743. Authors' affiliation: Department of Computer Science, P. O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. Phone: +358 0 708 44218; Fax: +358 0 708 44441; Email: helena.ahonen@helsinki.fi. WWW: Helena Ahonen Home Page.

"Abstract: We consider the problem of generating grammars for classes of structured documents -- dictionaries, encyclopedias, user manuals, and so on -- from examples. The examples consist of structures of individual documents, and they can be collected either by converting typographical tagging of documents prepared for printing into structural tags, or by using document recognition techniques. Our method forms first finite-state automata describing the examples completely . These automata are modified by considering certain context conditions; the modifications correspond to generalizing the underlying language. Finally, the automata are converted into regular expressions, and they are used to construct the grammar. In addition to automata, an alternative representation, characteristic k-grams, is in-troduced. Some interactive operations are also described that are necessary for generating a grammar for a large and complicated document."

Available on the Internet: http://www.cs.helsinki.fi/~hahonen/ahonen_icgi94.ps [mirror copy, December 1995].



[CR: 19951113]

Akpotsui, Extase K. A. Transformation de types dans les sytèmes d'édition de documents structurés. Doctoral thesis presented on September 26, 1993. Thèse spécialité Informatique. Grenoble: L'Institut National Polytechnique de Grenoble, 1993. Extent: 208 pages, bibliography. 840K Postscript file.

"Abstract: In structured editing systems, documents are considered as logical structures made up of typed components. These are defined in a generic structure representing the organization of the whole document. Such systems are based on a strong type checking of documents, such that any change to a type definition makes documents unprocessable. The evolution of a type definition in a generic structure is called a static type transformation. The structural changes to elements during an editing session are called dynamic transformation. The aim of this thesis is to study the problems induced by the static and dynamic type transformations.

The first part is an introduction to structured documents editing systems and standards such as SGML, DSSSL, ODA.

The second part explains the main changes that could occur in a type definition, along with a taxonomy of elementary transformations.

The third part presents a mathematical type modelling relevant to either dynamic or static type transformations:

  1. For static transformations, the important characteristics of types are represented and used to rigorously express the possible changes.
  2. Types are assimilated to trees and two types can be compared in order to point out the relation (sub-type, factor, cluster, equivalence, compatibility) that links them.
  3. Finally, a grammatical approach considers a document as a word which is part of a language. The words of that language are generated from an alphabet composed of the identifiers of the basic types of the system, the identifiers of the generic structures and a set of symbols representing the available constructors.

See also the document sumary in French.

The thesis is available on the Internet: ftp://ftp.imag.fr/pub/OPERA/doc/These93-E.Akpotsui.ps.gz [mirrored copy, November 1995]. See also the full text of the abstract.



[CR: 19970809]

Akpotsui, Extase K. A; Quint, Vincent; Roisin, Cécile. "Type Modelling for Document Transformation in Structured Editing Systems." Mathematical and Computer Modelling 25/4 (February 1997) 1-19 (with 26 references). Authors' affiliation: INRIA/Project Opéra.

"Abstract: Abstract: This paper addresses the problem of type transformation in structured editing systems and proposes a type description model convenient for type comparison and document conversation. Two kinds of transformations are considered: dynamic transformations allow a structured editor to change the structure of a part of a document when the part is copied of moved, and static transformations allow specific tools to restructure documents when their generic structure is modified. We present in this paper the current state of our research on formal analysis for these transformations."

Available on the Internet in Postscript format: ftp://ftp.inrialpes.fr/pub/opera/publications/MCM97.ps.gz; [archive copy]



[CR: 19951113]

Akpotsui, Extase K. A.; Quint, Vincent; Roisin, Cécile. Type Modelling for Document Transformation in Structured Editing Systems. INRIA-IMAG Internal Report. Gières, France: INRIA [Institut National de Recherche en Informatique et en Automatique] / IMAG, March 29, 1993. Extent: 29 pages, 26 references.

"Abstract: This paper addresses the problem of type transformation in structured editing systems and proposes a type description model convenient for type comparison and document conversation. Two kinds of transformations are considered: dynamic transformations allow a structured editor to change the structure of a part of a document when the part is copied of moved, and static transformations allow specific tools to restructure documents when their generic structure is modified. We present in this paper the current state of our research on formal analysis for these transformations."

The paper was submitted for publication in Mathematical and Computer Modelling. It is available in draft format via the Internet: ftp://ftp.imag.fr/pub/OPERA/doc/Modelling.ps.gz [mirrored copy, November 1995].



[CR: 19951113]

Akpotsui, Extase K. A.; Quint, Vincent. "Type Transformation in Structured Editing Systems." Pages 27-41 (with 10 references) in EP [Electronic Publishing] 92: Proceedings of Electronic Publishing, 1992. International Conference on Electronic Publishing, Document Manipulation, and Typography. Swiss Federal Institute of Technology, Lausanne, Switzerland. April 7-10, 1992. Sponsored by the Swiss Federal Institute of Technology and the Swiss National Science Foundation. Edited by Christine Vanoirbeek and Giovanni Coray [EPF, Lausanne, Switzerland]. The Cambridge Series on Electronic Publishing. Cambridge: Cambridge University Press, 1992. ISBN: 0-521-43277-4. Author affiliation: Swiss Federal Institute of Technology, Lausanne, Switzerland.

"Abstract: Recent advances in structured editing systems have put the emphasis on an important problem, which prevents structured editing systems from being used as easily as other document preparation systems: type transformation. This paper identifies two aspects of the problem. Dynamic transformations allow a structured editor to change the structure of a part of a document when this part is copied or moved to different places in the document or when it is restructured by the user. Static transformations allow specific tools to restructure documents when their generic structrure is modified. Various types of such transformations are analyzed and the specific tools implemented in the Grif system are presented."



[CR: 19980127]

Alexander, George A. "New Life for SGML. SGML Gets a New Lease on Life at DC Conference. XML Is the Big Thing (but Not the Only Thing)." The Seybold Report on Publishing Systems 27/9 (January 19, 1998) 1, 25-31. ISSN: 0736-7260.

George Alexander offers an in-depth analysis of the SGML (and XML) software products demonstrated at the SGML/XML '97 Conference (Washington DC, December 7 - 12, 1997), sponsored by GCA and SGML Open. A sidebar on XML-Data (by Liora Alschuler) emphasizes that XML is being extended into areas of database publishing and information management that have hitherto been less evident in the case of SGML tools. The author acknowledges that XML was indeed the "big news" at the conference, but reminds readers in this article that SGML software used for publishing is still very strong -- and is making money. SGML publishing software reviewed in this SRPS article includes: 1) TopLeaf - a looseleaf publishing system from Turn-Key systems; 2) 3B2 composition system which uses SGML as its internal data format, and other evidence of Advent Systems; 3) Miles 33 - now automatically producing "3000 - 5000 pages per day" from SGML source at the showcase Deere & Co. Miles 33 installation; 4) Penta - SGMLPublisher pakage, with an interface to ArborText's ADEPT editor; 5) STEP and its SigmaLink repository and editorial system, showing vigor in Europe (and in the US); 6) Poet Software's "Wildflower" (SGML/XML repository), and Web Factory; 7) Xyvision's SGML-based translation support in Ambassador; new WebPorter tool, and the announced XML support in PDM (Parlance Document Manager); 8) Progresive Information Technologies (PIT) and the Target 2000 database system optimized for reference works; 9) Texcel's Information Manager 2.0 - integration with FrameMaker+SGML, and a Web interface.



[CR: 19970218]

Alexander, George A. "Penta's SGMLPublisher: Direct Route From SGML to Pages [Direct Route to Pages]." The Seybold Report on Publishing Systems 26/10 (February 10, 1997) [1], 19-23. ISSN: 0736-7260.

The author describes a series of tools developed by Penta to import SGML documents and create appropriate stylesheets for use in composition. See the main database entry for Penta for further information. Or, see the SGMLpublisher product description in a CTS posting by Michael Goldfarb.



[CR: 19970718]

Alexander, George A. "SGML at Imprinta." The Seybold Report on Publishing Systems 26/19 (July 4, 1997) 41-43. ISSN: 0736-7260.

Alexander reports on the presence of SGML at the Imprinta '97 Conference in Düsseldorf, June 4-10, 1997. Imprinta '97 was an international "Pre-press and communication" conference with 61,000 visitors. STEP (SigmaLink editorial system), Advent (3B2 system with SGML composition and editing support) and Siemens (tools integration, especially at the Princeton facility) are covered in the report. Alexander offers the opinion that "SGML is becoming very big in Europe...the pace of development and deployment seems higher there than in the U.S."



[CR: 19961111]

Alexander, George A.; Alschuler, Liora. "SGML Europe '96: What's the Next Step for SGML [If SGML Is the Answer, What is the Question?]." The Seybold Report on Publishing Systems 25/19 (June 30, 1996) [1], 12-25 . ISSN: 0736-7260.

The article amounts to an extended overview of new SGML software offerings on the show floor of the SGML Europe '96 conference. Following an introductory discussion of 1996 development trends and a presentation of the Yuri Rubinsky Insight Foundation, the article covers: STEP DPA wire service; InContext-Folio journal publishing; Texcel; XSoft's Astoria; ModuleMaker; Info-Base (database publishing, Copenhagen); Stilo SGML Generator; Editime (Timelux editor with UNICODE support); Grif Symposia Pro; Nice (TagWizard); SoftQuad updates (Author/Editor and HoTMetaL); Datalogics (WriterStation); Balise; LT NSL; Fotek (3B2 SGML version); Sörman SplitVision; Synex (ViewPort); EBT (DynaBase); Jouve (GTI Publisher). See the full article on the Seybold WWW server.



Alexander, George; Walter, Mark. "A Fresh Look at SGML: The Conventional Wisdom Changes." The Seybold Report on Publishing Systems 20/7 (December 24, 1990) [1,] 3-16. ISSN: 0736-7260.

"This article discusses some of the recent development in the use of SGML in publishing applications. The backdrop for our report is SGML '90, a GCA conference held last month in Philadelphia. In addition to writing our story from the event, we retell some user stories heard there, offer some additional sources of information and report on a research project [ICA] that may lead to commercial products."

The article includes a brief introduction "SGML '90: A New Breed of Users Step Forward".



[CR: 19971107]

Allen, Charles Axel. "WIDL. Application Integration with XML." Pages 229-248 in XML: Principles, Tools, and Techniques. Guest Edited by Dan Connolly. World Wide Web Journal [edited by Rohit Khare] Volume 2, Issue 4. Sebastopol, CA: O'Reilly & Associates, Fall 1997. Extent: xxii + 248 pages. ISBN: 1-56592-349-9. ISSN: 1085-2301. Author's affiliation: webMethods, Inc.

Abstract: "The problem of direct access to Web data from within business applications has until recently been largely ignored. The Web Interface Definition Language (WIDL) is an application of the eXtensible Markup Language (XML) which allows the resources of the World Wide Web to be described as functional interfaces that can be accessed by remote systems over standard Web protocols. WIDL provides a practical and cost-effective means for diverse systems to be rapidly integrated across corporate intranets, extranets, and the Internet."

A version of this document is available online in HTML format: http://www.webmethods.com/technology/Automating.html; [local archive copy, text only].



[CR: 19971227 MD: 19980108]

Allen, Terry. "Package or Perish." Pages 385-390 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Terry Allen]: Co-designer of the DocBook DTD (Davenport Group). Email: tallen@sonic.net; WWW: http://www.sonic.net/~tallen/.

Abstract: "SGML documents can be large and complex, composed of many parts in various formats. These parts may be entities, subdocuments, or other SGML documents that are linked to as part of the content of an enframing document."

"Such a compound document may consist of an SGML declaration, a DTD (which may be composed of modules stored in separate files), a document entity (where the DOCTYPE declaration appears and in which parsing begins), external entities, other SGML documents, and noncharacter data (such as pictures and sounds). Beyond that, a document may require style sheets, fonts, an SGML Open catalogue, a 'readme' file, a statement of conditions of use, digital signatures, authentication information, and on and on.

"In this paper I'll point out some circumstances under which one might need to package together some or all of the items comprising a full compound document, describe some advantages of and requirements for packaging, briefly mention some existing packaging schemes, and outline my own suggested solution."

"If SGMLlers want to control their own destiny in archival preservation, copyright, and commerce, it would be wise to take up the challenge of packaging specifically for SGML (and XML), or at least come to agreement on what requirements a packaging system must meet. If we don't figure out packaging, someone else may do so in ways we find painful. Our first attempt, MIMESGML, seems to have failed because it was too complex. I hope the simpler solution I've offered has the clarity necessary for success."

This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



Allen, Todd; Nix, Robert; Perlis, Alan. "PEN: A Hierarchical Document Editor." SIGPLAN Notices [= Proceedings of the ACM SIGPLAN SIGOA Symposium on Text Manipulation] (ACM SIGPLAN SIGOA Symposium on Text Manipulation, Portland, Oregon, June 8-10, 1981) 16/6 (1981) 74-81. 14 references. Author' affiliation: Yale University, Computer Science Department.

The name "PEN" apparently came from the initials of Alan Perlis, John Ellis, and Robert Nix, the early designers and developers of the editor.



[CR: 19961111]

Alschuler, Liora. ABCD... SGML: A User's Guide to Structured Information. London/Boston: International Thomson Computer Press (ITCP), 1995. Extent: xviii + 414 pages; diskette with SoftQuad's Panorama FREE browser and SGML Resource Guide. ISBN: 1-850-32197-3. Author's affiliation: The Word Electric, East Thetford, Vermont.

Abstract: "SGML, or 'Standard Generalized Markup Language,' is a complex and powerful new programming language for text that is revolutionizing information production, storage, and retrieval in both the public and private sectors. This book is an introduction that will enable readers to decide if they should acquire the new technology. Once that question is answered, this book offers specific guidelines for evaluating hardware, software, and training requirements. This is a production-oriented primer designed to reduce the stress of accommodating this new SGML technology. This book introduces basic concepts and practices; describes the components of an SGML system; illustrates how it is used by a series of case studies; assesses if SGML is for you; describes the process of converting to SGML; describes the process of working with SGML; and provides resources and referrals for more information." [from the publisher]

ABCD... SGML is a "book for managers, writers, and programmers that takes a system-wide approach to the application of SGML (Standard Generalized Markup Language) to publishing, information management and the Internet. This . . . SGML book . . . considers the collateral changes to the organization of work that accompany every successful transition to new technology. The book includes a disk that demonstrates SGML in action. The book provides an overview of the tools and processes that put SGML to work on real-world applications and describes over a dozen case studies of SGML in use in financial analysis; judicial administration; multimedia entertainment; scientific and academic research; electronic, print and technical publishing; and, not least of all, on the World Wide Web. It explains how to do a needs analysis that will tell you if SGML is appropriate for your situation. The book introduces SGML-based data design and system design and covers the transition to SGML and work in an SGML-based environment.

"ABCD... SGML explains SGML and SGML-based technology in such a way that anyone can read it and understand how to use SGML to: (a) create high quality electronic books and online databases; (b) spin-off new products from current resources; (c) eliminate redundant coding; (d) add value to your information resources independent of any single application, vendor, or platform and insure against future data conversion costs; (e) automate new areas of your enterprise not suitable for conventional database technology; (f) comply with industry standards for information interchange.

"The volume demonstrates SGML with a floppy disk containing ABCD... SGML Resource Guide A: Keeping Pace with SGML, a guide to SGML contacts that is coded in SGML and is bundled with SoftQuad's Panorama FREE browser. The files can be viewed as plain ASCII text or locally in Panorama for Windows or, with a Web connection, in Panorama with hot links to all the Web resources listed in the Guide. These resources include the best sites for information about SGML and the best sites for information in SGML." [adapted from a publisher's press release, October 1995]

See now [November 15, 1995] the online press release at ITCP, with the online Table of Contents and the volume Preface. The Thomson server also provides an online version of Microsoft's Cinemania project, which "demonstrates use of SGML in production of mass-market multimedia, entertainment." The author also supplies a personal description of the book in postings to TEI-L and Usenet newsgroup comp.text.sgml

See the book announcement in the SGML Users' Group Newsletter 29 (November 1994) 21. According to the announcement, the book "explains SGML and SGML systems to the depth required for non-technical implementaters (sic!) and is also a good instruction for those who are going on to more technical work. The guide will be particularly valuable for publishers, corporate publications managers, technical writers and technical writing managers, and MIS personnel." Price US $39.95. See a fuller description and advertisement for the book in the an early [November 1994?] publisher's volume description excerpted in this database.

A published review of the book is available from by Seybold Report on Publishing Systems in SRPS 25/9 (January 29, 1996) 42; this review is accessible in HTML format on the Internet. See this review article also on the Seybold WWW server. See also the review by Dianne Kennedy in <TAG>: The SGML Newsletter 9/1 (1996) 9-10.



[CR: 19970620]

Alschuler, Liora. "Behind the Scenes at the WSJ Interactive Edition [Report from the Edge: How WSJ Interactive Built its Own Web Publishing System]." Seybold Report on Internet Publishing 1/8 (April 1997) 1, 15-21. ISSN: 1090-4808. Author's affiliation: The Word Electric.

"Abstract: It is well known that the Wall Street Journal Interactive Edition is a rarity: an online newspaper that charges for its content. Less well known is that it's also unusual for its approach to building online systems: instead of shoveling print pages onto the Web, Dow Jones and contractor EDS built an editorial system optimized for editors working in the new medium. In a behind-the-scenes look at how the Interactive Journal gets created and produced, the author shows how Dow Jones uses Microsoft Word to create a structured editing environment whose output is SGML [Standard Generalized Markup Language], while at the same time providing editors with WYSIWYG previews. The result is a winning combination: a system that automates production without sacrificing editorial control or output quality."

The workflow uses Microsoft Word, a simple SGML markup language called DJML-Lo (Dow Jones Markup Language), OmniMark products, James Clark's SP SGML parser toolkit, and Perl libraries from David Megginson.



[CR: 19970725]

Alschuler, Liora. "Britannica Online: Reinventing the Encyclopedia." Seybold Report on Internet Publishing 1/3 (November 1996) 13-20. ISSN: 1090-4808. Author's affiliation: The Word Electric .

[Summary:] "In the fall of 1993, it had been 18 years since the last full edition of the Encyclopaedia Britannica was typeset. In less than a year, the editorial staff went from asking, in a memo now infamous in the Chicago headquarters, 'Have you ever heard of the Internet?' to the final stages of a top-to-bottom redesign of editorial and production facilities that has recast the 228-year-old encyclopedia not as a book, but as a database with print, CD-ROM and online media...As part of this effort, the company is midstream in the move to SGML, but the changeover point will not be reached until next year."

"Britannica has been struggling to replace its existing system, in part because there are no off-the-shelf systems built for reference publishers, let alone ones for encyclopedias, which add another layer of complexity. . .Britannica has since decided to put together the system itself, with help from some contractors. . .The new system will be an SGML-encoded database of text and metadata stored in IDI's BasisPlus SGML. FrameMaker+SGML will be used as the text-editing tool. Britannica wrote its own image database using Visual Basic. Composition of some of the annual books and specialty publications will be done with Frame; the main encyclopedia will be composed with a custom program developed by Fred Rose and Associates, led by the founder of now-defunct Magna Computer Systems. Its experience in data conversion led the editorial support group to develop its own data conversion routines for migrating the text to SGML. It has already converted the Micropaedia and is well on its way to completing the Macropaedia. . . Under the new system, semantic tags will be added to the content, identifying information such as dates of birth and geographic locations. Such markup will lay the groundwork for future functions, such as time-based (historical) and geography-based searches. In Merrick's words, 'we're moving from tagging that works for 10 years to tagging that works for 100 years'." [Address: Encyclopaedia Britannica, 310 S. Michigan Avenue, Chicago, IL 60604; Phone: +1 (312) 347-7000; FAX: +1 (312) 294-2123.]



[CR: 19971007]

Alschuler, Liora. "The Data-Driven Desktop: DataChannel Pushes XML ['Analysis' Feature Article]." Seybold Report on Internet Publishing 2/2 (October 1997) 1, 9-14. ISSN: 1090-4808. Author's affiliation: The Word Electric.

The author provides a detailed description and analysis of DataChannel's ChannelManager application. The Seybold Report on Internet Publishing calls DataChannel "the first commercial end-user product to do something interesting with the Web's new standard [XML] for open information." Excerpt: "The data is the desktop. ChannelManager not only 'pushes' the content, it shapes the user interface, and behind the scenes, it is XML metadata that is pulling the strings. . .DataChannel is interesting as an early implementor of XML, a standard that should substantively change the art of publishing on the Internet. Document markup may be used for more than just formatting, and right now Web developers are just starting to latch onto structured markup as a handle for controlling the flow of information. . . To see adoption of XML this early for this purpose confirms that the rewrite of SGML is meeting one of the objectives of the XML project, namely, to put the rules of SGML structured markup into a form that speaks to mainstream programmers."

See the DataChannel Web server for other information on DataChannel's XML products.

See also the full text in the online version of the article; [archive copy]



[CR: 19970626]

Alschuler, Liora. "From SGML to SyBooks: How Sybase Puts 50,000 Pages Online [Report from the Edge]." Seybold Report on Internet Publishing 1/1 (September 1996) 17-21. ISSN: 1090-4808. Author's affiliation: The Word Electric, East Thetford, Vermont.

A collection of technical documents maintained by the publishing group at Sybase consists of some 50,000 pages of documents. The repository is called "SyBooks on the Web," and its pages are accessed electronically as much as 20,000 times per day. The documents, typically authored in FrameMaker under a strict template, are converted into SGML by PassagePro. DynaText stylesheets are then defined so that the documents may be published (generated) on CD-ROM and on the Web. For delivery over the Web as HTML documents, EBT's DynaWeb "splits up the corpus, rather than delivering it in one unwieldy chunk of HTML. In response to user queries, it converts a manageable section of the indexed source into html. Then the DynaWeb server sends just that portion to the client. URLs, link ends, anchors and formats are assigned automatically. How big the chunks are, how they are delineated and how they are linked are functions of the original SGML markup and the design of the DynaWeb style sheet." A similar strategy using DynaText and DynaWeb is employed by Novell, where 150,000 pages of SGML documentation are delivered online through dynamic "down-translation" into HTML.

Summary: "The power behind SyBooks on the Web comes from SGML (Standard Generalized Markup Language) -- rich, unambiguous, rigorous and vendor-neutral markup to ASCII source files. Using SGML source files, Sybase creates, manages and renders Web-ready documents from the same files used for its cd-rom and print products. The online books thus created are accessible through any garden-variety Web browser, but they contain search and navigation features that would be more difficult to achieve at this scale, for this type of material, from a non-SGML-powered production system. Other data formats may be just fine for repositories of flat, discrete text, or documents stored as single chunks, but when the reference load is book-length or when the roadmap is a nonrepeating, multilevel, nested hierarchy (a deep outline or table of contents), SGML can enrich navigation and retrieval, even as it automates production. . . According to Steve Goodman, [...] 90% of Sybase's documentation is published on both CD-ROM and the Web within two weeks of author signoff of the final draft. Actually, it can be electronically formatted almost instantly. This is possible, says Goodman, because the information in the books is fully described in unambiguous fashion by the SGML markup." [extracted]

Note: The Seybold Report on Internet Publishing is a new publication in 1996; see the online description. For other details on the Sybase publishing arrangement, see: "DynaWeb Serves Sybase's Large, Media-Rich Documents to Any Web Browser" (Press release); [mirror copy].



[CR: 19980515]

Alschuler, Liora. "Making Your Site Accessible: A Practice that Benefits Everyone." Seybold Report on Internet Publishing 2/9 (May 1998) 1, 17-20. ISSN: 1090-4808. Author's affiliation: The Word Electric.

The author surveys the central issues in Web accessibility, and includes a brief section "But what about XML?" (reference to HyTime/SGML architectural forms as a mechanism for adding critical information to structured documents). She references the W3C's Web Accessibility Initiative (WAI), the Yuri Rubinsky Insight Foundation, and other initiatives which have been responsive to the need to design documents with the physical disabilities of humans in mind. She argues that "accessible design is, by necessity, media-independent design"; here the traditional/classical goals of "descriptive markup" are harmonious with concerns for accessible document design. Yuri Rubinsky was a proponent of this idea, illustrated in the ICADD effort.



[CR: 19961226]

Alschuler, Liora. "SGML - Taking the Show on the Road." Pages 515-520 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: Writer and Consultant, The Word Electric, Route 5 & Sanborn Road, POB 177, East Thetford, VT 05043, USA; Tel: 802/785-2623; Email: Liora@The-Word-Electric.com.

Abstract: "This talk looks at different approaches to introducing SGML, at different perceptions of the language and related technology, and at the changing nature of the audience for SGML. It is for those who are just being introduced to SGML and for those who must now make the case for SGML within their organization or industry."

Note: The above presentation was part of the "SGML Business Management" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19980515]

Alschuler, Liora. "Structured Editors: What We Saw is What You Might Get. [Product Roundup: A Fresh Crop of Structured Editors. Trip Report.]" Seybold Report on Internet Publishing 2/9 (May 1998) 1, 8-12. ISSN: 1090-4808. Author's affiliation: The Word Electric.

Alschuler reports on four of the editing tools demonstrated at the Seattle XML Conference (March 23 - 27, 1998). Xerox showed a prototype of an editing tool ("Raven"), developed internally at Xerox to support technical publishing in a distributed computing environment. Henry S. Thompson of the University of Edinburgh Language Technology Group demonstrated XED, an XML-based text editor optimized for rapid keyboarding. SoftQuad showed the beginnings of an XML editing tool based upon HoTMetaL code, called XMetaL. Interfeaf announced support for a new editing application built on Interleaf 6 ("BladeRunner"), featuring integration with Microstar's Near & Far Designer (a DTD editing tool). XML support in products from Adobe and Microsoft is (apparently) nominal at this time, and not focused upon structured editing tasks per se. As clarified by Microsoft's Matthew Price, Office 98 will not use XML as a principal 'file format', but will support saving documents in HTML. As explained in an SRIP note ("Correction: Microsoft Never Changed its Tune", page 2) there will not be support yet for XML in MS Office "to provide round-trip editing of HTML documents" as was previously thought by some.

Summary: "With XML on the rise, the editing tool market is changing. At the recent XML Conference we found four new products, including surprises from Xerox and a UK research group. We also received an update from Microsoft, which clarified its plans for XML support in Office 98, due out later this year."



[CR: 19960206]

"The Well-Grounded Guide to SGML. [Seybold Report Review of] Alschuler, Liora, ABCD... SGML: A User's Guide to Structured Information." Seybold Report on Publishing Systems 25/9 January 29, 1995 42. ISSN: 0736-7260.

The review of Alschuler's book ABCD... SGML: A User's Guide to Structured Information is favorable: "Well researched, well organized, and, most important, well written, Liora Alschuler's overview of SGML and its application to structuring documents is among the most engaging works yet published on the subject. The book not only covers the history of SGML but also lays out the issues in building SGML systems, explains a variety of approaches based on actual case studies, and provides a wealth of resources useful to both novice and experienced SGML practitioners. If you are looking for a book that puts SGML in perspective, this is it. [. . .] Most SGML texts are aimed either at editorial staff or at technical staff. Alschuler masterfully satisfies both audiences in a technical exposition that is remarkably engaging to read."

The review is available online in HTML format.



[CR: 19960716]

Alschuler, Liora. "Canadian Government Sinks Its Teeth Into Herculean SGML Effort [SGML In Canada]." The Seybold Report on Publishing Systems 25/14 (April 23 , 1996) [1], 18-23. ISSN: 0736-7260.

The author supplies a detailed diagnostic report on a recent conference ("SGML Technology 1996, Applications in Government and Industry", Ottawa March 27, 1996); she concludes that Canada seems to be leading the US in adopting SGML. The Treasury Board of Canada announced "an initiative that will make publishing standards, especially SGML, mission-critical throughout the Canadian government."

SGML in Canada is a TBITS (Treasury Board Information Technology Standard) Standard, so the role of the Treasury Board Secretariat is central in the new Canadian effort. "Unlike the U.S. government, the Canadians now seem to be getting serious about making SGML the lingua franca of federal and judicial publications. . .Bernie Gorman, assistant secretary of the Treasury Board Secretariat (TBS), Information Management, Systems and Technology, announced a major initiative to link standards endorsement to government operational requirements with the TBS taking on the 'high priest' role of standards architect."

"The announcement of the TBS initiative at the SGML Technology conference coincided with the launch of Canada's largest Web site, another effort supported by Industry Canada and the Treasury Board. . .Prime Minister Jean Chretien and Industry Minister John Manley inaugurated the $5.5 million site, an event that received press coverage across Canada. . .Dubbed Strategis [http://strategis.ic.gc.ca], the new site with 60,000 documents, 2 GB of SQL trade and company databases, and 500,000 pages marked up in SGML lends a measure of credibility to the lets-get-serious pronouncements of the Treasury Board. The mission of the new site is to provide strategic business information to Canada's small and medium-size businesses to help them move into the global economy. These firms accounted for 70% of the growth in jobs from 1979 to 1989."

See the main conference entry for forther details.



[CR: 19970518]

Alschuler, Liora. "To DTD or Not to DTD?" SEYBOLD NEWS & VIEWS ON ELECTRONIC PUBLISHING 2/27 (April 9, 1997) [pages: ]. Author's affiliation: The Word Electric, East Thetford, Vermont.

In the article, 'Liora Alschuler discusses the XML specification announced in November and weighs in on the debate that has raged on since its introduction: Is the use of Document Type Definitions (DTDs) necessary?'

"To users and vendors alike, the most shocking aspect of the XML specification announced last November was that document type definitions (DTDs) would be, in certain instances, optional. Challenged on this point almost immediately, the W3C Editorial Review Board stated unequivocally that its intent and design was to make validation against a document type optional for "output" processing. The board explained that XML documents would be validated against a DTD on creation and on recombination and revision so that documents handed off to a browser or composition system could be presumed valid."

See the article online: .



[CR: 19980413]

Alschuler, Liora. "HL7 Announces Next Meeting in Paris." XML Files: The XML Magazine Issue 04 (March 17, 1998) 19-20. Author's affiliation: The Word Electric.

Summary: "The support for structured markup (SGML and XML) in healthcare applications and healthcare exchange standards is expanding rapidly and with that support, the need to communicate and coordinate efforts across national boundaries. To address this need Prof. Dr. Joachim Dudeck, Institut fuer Medizinische Informatik, Justus-Liebig-Universitaet Giessen, and myself, Liora Alschuler, Chair of Kona Editorial Group, HL7 SGML/XML SIG are pleased to announce a day dedicated to SGML/XML standards and applications in healthcare preceeding the Paris GCA conference, [May 18] SGML/XML Europe '98."

Available online. For more information, see the main entry: SGML Initiative in Health Care (HL7 Health Level-7 and SGML).



[CR: 19970815]

Alschuler, Liora. "SGML Looks Around The Corner in Barcelona, Sees XML [Barcelona, Düsseldorf, New York: SGML Faces XML]." The Seybold Report on Publishing Systems 26/19 (July 4, 1997) 1, 34-41. ISSN: 0736-7260.

The author reports in detail on SGML events at the SGML Europe '97 Conference in Barcelona, and summarizes SGML highlights at Seybold Seminars New York '97. Attendance at SGML Europe '97 was up by 20% (more than 550), according to GCA. From the Barcelona conference, Alschuler cites developments from Grif, SoftQuad, Stilo and Inso as evidence that XML is taking its place in the priorities of software companies. Highlights from the New York Seybold Seminars '97 included demonstration of Chrystal Software's Astoria 2.0 - a document database management suite now tightly integrated with Adobe's FrameMaker+SGML.



Alschuler, Liora "Special Section: Standard Generalized Markup Language. Introduction." Technical Communication: Journal of the Society for Technical Communication 40/2 (Second Quarter, May 1993) 208-290, and 40/3 (Third Quarter, August 1993) 376-378. ISSN: 0049-3155. Author affiliation: Miles-Samuelson, Inc.

These two issues of Technical Communication have eight (8) articles on SGML. See: [xrefs, not complete yet].



[CR: 19970620]

Alschuler, Liora. "XML [Extensible Markup Language] Shops for a Market, Finds Vendors. Netscape Turns Around After Momentum Builds in San Diego." Seybold Report on Internet Publishing 1/8 (April 1997) 3-4. ISSN: 1090-4808. Author's affiliation: The Word Electric, East Thetford, Vermont.

The article surveys early XML markets and prospects for the development of XML (Extensible Markup Language) software tools.

Summary: "The tide of support for the Extensible Markup Language (XML) is clearly rising. At a special XML conference held in San Diego last month [March 1997], a wave of vendors voiced their support, and a few weeks later Netscape, an early opponent, reversed its position and decided to actively 'investigate' implementation. In the meantime, several early implementations have appeared in the market."



[CR: 19970325]

Alschuler, Liora. "XML Could Sidestep HTML Split." 3/7 (March 24 197) [?].

". . .now XML (eXtensible Markup Language), a draft standard from the W3C, offers some hope of a rationalized system for complementing HTML without creating the kinds of browser incompatibilities that currently threaten to divide the Web into Netscape and Microsoft camps. XML is an effort to recognize the desire to write new markup tags, but it lays down some simple rules for doing so, and turns user-defined markup into a force for stability, interoperability, and a powerful new breed of client-side processing applications. XML is essentially a slimmed-down, Web-enabled version of the Standard Generalized Markup Language (SGML), the International Organization for Standards' "meta-language" from which the original HTML was crafted. But SGML's complexity has been a barrier to its widespread adoption. XML was specifically created as a way to simplify the language and create an alternative to HTML."

"Major Web players such as Microsoft, Sun, Novell, Hewlett-Packard, and IBM sit on the XML Editorial Review Board (ERB), which launched its effort to define the standard last November. Noticeably absent from the effort has been Netscape, whose ultimate support will be crucial if the standard is to make it. . .In the meantime, Microsoft has emerged as the most powerful proponent of the effort."

Available online: Article by Liora Alschuler: "XML Could Sidestep HTML Split," in WebWeek [The Newspaper of Web Technology and Business Strategy] 3/7 (March 24, 1997); [mirror copy].



[CR: 19980112]

Alschuler, Liora; Alexander, George. "Coming of Age in Cyberspace: Births, Deaths, and Milestones at SGML/XML '97. Trip Report. [Alternate title: Behold the Newborn: Vendors Herald the Arrival of XML]." Seybold Report on Internet Publishing 2/5 (January 1998) 1, 21-34. ISSN: 1090-4808. Authors' affiliation: [Alschuler]: The Word Electric; [Alexander]: Seybold Publications.

This feature article reports on the SGML/XML '97 Conference and Exposition.

Abstract: "Ever since XML was first announced just over a year ago, we've been saying that it would have a tremendous impact on Internet publishing. It seemed only logical to us that the Web, which was grounded in a limited form of generic markup (HTML), should extend that markup to embrace the richness we all enjoy in print. XML, though still an infant, promises to provide the basis for much better text processing than the Web has seen before. It will enable better typography, more specific searching, faster downloads and much more sophisticated data representations than HTML will ever provide. No single document architecture, no matter how rich or complex, can cover all of the possible types of documents people create. Only a standard and widely supported metalanguage - one that lets authors and publishers create tags and structures that reflect their documents - provides the flexibility that expression of written communication demands. Only such a metalanguage can support the continual refinements in document layout and processing that online publishing requires." [from the article Introduction]

In addition to overview of the conference and critical commentary on industry developments, the authors review the major conference highlights with respect to XML/SGML software, in four categories: Structured Editing and Tagging Tools (Adobe, ArborText - Cedar, Citec, WordPerfect, Enigma, Exosoft - Documentor, DynaTag, LiveLink, SoftQuad - HoTMetal/AuthorEditor, Stilo - WebWriter); Document Display and Distribution (AIS, Documensa, SoftQuad, Synex); Utilities and Programming Tools (AIS - Balise, Microstar, OmniMark - Banff); Tools for Toolmakers (DataChannel, Copernican Solutions, SGML Technologies). Sidebars in the article include: 1) "The State of the XML Standards"; 2) Meanwhile, XML Sneaks into Internet World"; 3) "Growth and Excitement in DC" (conference statistics: record numbers of participants, with 1157 conference attendees, 25 press, and 1200 exposition-only attendees); 4) "Upcoming XML Events."

Note: The Seybold Report on Internet Publishing regularly covers major SGML and XML events relating to Internet publishing.



[CR: 19971123]

Alschuler, Liora; Dolin, Robert; Spinosa, John. "SGML in Healthcare Information Systems." Page(s) 195-204 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Authors' affiliation: [Alschuler]: The Word Electric, USA; [Dolin]: Southern California Permanente Medical Group; [Spinosa]: Scripps Memorial Hospital.

Summary: "There is a growing consensus that healthcare records, including the individual patient record, will be gathered, managed, and distributed electronically, but there is little consensus on how this will be done. One study estimates that fewer than five percent of providers have determined how they will computerize patient records. In this climate, where our second largest industry has yet to establish an informational infrastructure, what does SGML (Standard Generalized Markup Language) have to offer? What are the prospects of large-scale use of SGML-based technology? How does use of SGML relate to other standards efforts?

"This paper examines the place of SGML within healthcare informatics, reports on some recent work demonstrating the application of SGML to healthcare records, and discusses the relationship between SGML-based standards for healthcare and other standards initiatives. It concludes with a brief discussion of one type of SGML architecture and applications envisioned for healthcare."

The document is available in RTF format from the HL7 server; [local archive copy]. See also the main database entry for the SGML Initiative in Health Care (HL7 Health Level-7 and SGML).

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.



[CR: 19980114]

Alschuler, Liora; McKenzie, Matt. "Perspecta Takes Fresh Approach to Using XML Metadata to Navigate Content. SmartContent System's Custom Views of Online Documents Show Promise for Publishers with Complex Information Bases." Seybold Report on Internet Publishing 2/5 (January 1998) 36-37. ISSN: 1090-4808. Authors' affiliation: [Alschuler]: The Word Electric; [McKenzie]: Seybold Publications.

Abstract: The authors provide an overview of Perspecta's SmartContent System, which features a distributed Java-based server, a Java-based client, and a collection of application management tools. The SmartContent System supports a "fly-through" navigational interface that lets users explore the document collection by "topics" which are displayed graphically by their relationships ('natural train of thought', 'lateral relationships', etc.). The added support for XML will make it easier for users to create SmartContent repositories.

See also the The Bulletin: Seybold News & Views on Electronic Publishing article: "Perspecta Integrates XML".

Other information may be found on the Perspecta Web site: http://www.perspecta.com/.



[CR: 19961226]

Alschuler, Liora; Lincoln, Thomas L.; Spinosa, John. "Medicine for SGML." Pages 181-190 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Authors' affiliation: [Alschuler]: Writer and Consultant, The Word Electric, Route 5 & Sanborn Road, POB 177, East Thetford, VT 05043; Tel: +1 (802) 785-2623; Email: Liora@The-Word-Electric.com; [Lincoln]: Professor Emeritus; Senior Scientist, University of Southern California; Rand Corporation, 802 Franklin Street, Santa Monica, CA 90403; Tel: +1 (310) 828-2174; FAX: +1 (310) 828-5142; Email: lincoln@rand.org; [Spinosa]: Staff Pathologist and Medical Director, Central Laboratory, Pathology Medical Group and Pathology Medical Laboratories, La Jolla, CA; Tel: +1 (619) 626-6000; FAX: +1 (619) 452-2930; Email: spinosaj@scripps.edu.

Abstract: "The paper introduces a new initiative for SGML in the medical informatics industry. It describes the current state of information processing in medicine, gives some of the requirements for a new, SGML-based approach to medical information processing, introduces the group working for the introduction of SGML into medical informatics and gives a brief description of the umbrella medical information standard called HL7 under which the new initiative is working. The paper concludes with a summary of the challenges facing the new initiative and an invitation to all to participate and contribute. Up-to-date information on contacts and programs will be available at the conference session."

Further information on the SGML Initiative in Health Care (HL7 Health Level-7 and SGML) can be found in the main entry of the SGML/XML Web Page.

Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19980421]

Alschuler, Liora; Walter, Mark E. Jr. "Netscape Delivers on Mozilla. Surprise! XML Support Included [The Latest Word]." The Seybold Report on Internet Publishing 2/8 (April 1998) 31. ISSN: 1090-4808. Authors' affiliation: [Alschuler:] The Word Electric; [Walter:] Seybold Publications; Editor, The Seybold Report on Internet Publishing.

Summary: "On Tuesday [March 31, 1998], Netscape Communications released on its Web site (http://www.mozilla.org) the source code for Mozilla, which, in days past, would have been known as Communicator 5.0. The release makes good on Netscape's dramatic promise in February to put its browser into the public domain. But Mozilla also contains the surprise inclusion of XML functionality we had not expected until later this year."

This article was first published in The Bulletin: Seybold News & Views on Electronic Publishing Volume 3, Number 26 (April 1, 1998). It was updated in SRIP, and is available online in HTML format. For more information on the Mozilla-XML connection, see the database entry "XML in Mozilla."



[CR: 19980720]

Alschuler, Liora; Walter, Mark. "Paris 1998: Man the Angle Brackets! [Trip Report.]" Seybold Report on Internet Publishing 2/11 (July 1998) 11-14. ISSN: 1090-4808. Authors' affiliation: Seybold Publications.

"In contrast to the SGML/XML Developer Conference last Fall, the May [1998] SGML/XML Europe event did not focus on up-and-coming Web applications. Instead, the undercurrent was the political agenda that drove publishers to generic markup in the first place: the need to publish in multiple forms, and the desire to store documents in a medium that is not controlled by the vendors of our publishing tools." The central section of this trip report covers XML editors, and is presented in the online document "Reviewing Structured Editors - Part Deux." See also part one.



[CR: 19970115]

Alschuler, Liora; Walter, Mark. "SGML '96: Celebrating The Tenth Anniversary of SGML [SGML '96: Tenth Anniversary Bodes Well for SGML]." The Seybold Report on Publishing Systems 26/8 (December 30, 1996) [1], 3-14. ISSN: 0736-7260. Authors' affiliation: [Alschuler]: Author's affiliation: Writer and Consultant, The Word Electric, Route 5 & Sanborn Road, POB 177, East Thetford, VT 05043, USA; Tel: 802/785-2623; Email: Liora@The-Word-Electric.com; [Walter]: Senior Editor, Seybold Publications.

The article reports comprehensively on the SGML '96 Conference, (1) surveying prominent themes (SGML case studies, SGML as a means of Web publishing, XML [Extensible Markup Language], use of SGML in medicine, and (2) reviewing new SGML tools demonstrated by software vendors in the exhibits: SGML editing tools [12 products], SGML document management tools [3 products], DSSSL implementation, and miscellaneous new SGML products.

See also the dedicated bibliographic collection with abstracts for most of the papers in the published conference proceedings for SGML '96.



[CR: 19960202]

"The American Poetry Full-Text Database [News and Notes]." Literary and Linguistic Computing 10/4 (November 1995) 318. ISSN: 0268-1145.

In April 1996, Chadwyck-Healey will release The American Poetry Full-Text Database containing more than 30,000 poems. Poems are from the early seventeenth century through the early twentieth century, and represent compositions by more than 200 American poets. All poems are encoded in SGML such that they can be searched and browsed in ways that make use of the literary structure within poetic corpora. The database will be issued in pre-packaged form for searching and browsing on Windows and Macintosh computers; it will also be available on tape for installation on larger networked computers. This American Poetry database is a companion to the popular English Poetry Full-Text Database, also published by Chadwyck-Healey; they are two of several such literary databases from C-H which make use of SGML encoding.



Amsler, Robert A.; Tompa, Frank W. "An SGML-Based Standard for English Monolingual Dictionaries." Pages 61-79 in Fourth Annual Conference of the UW Centre for the New Oxford English Dictionary: Information in Text. Proceedings of the Conference. Waterloo, Ontario, Canada, 26-28 October 1988. Waterloo, Ontario: University of Waterloo, 1988.

The 'Dictionary Encoding Initiative' referenced is loosely affiliated with the international Text Encoding Initiative; both projects seek to employ SGML. For SGML used in dictionary markup, see also Tompa below. Several of the Waterloo Annual Conference volumes contain articles germane to descriptively-tagged and SGML-tagged text. For further details on the Waterloo Centre, see Gonnet below.



[CR: 19951113]

Anbeek, Gert; Daelemans, Walter. "Text Processing Systems With Linguistic Knowledge." Pages 79-85 (with 11 references) in PROTEXT IV. Proceedings of the Fourth International Conference on Text Processing Systems. International Conference on Text Processing Systems, Boston, MA, USA 20-22 October 1987. Sponsored by INCA - Institute for Numerical Computation and Analysis. Edited by John J. H. Miller. Dublin, Ireland: Dún Laoghaire, Boole Press, Ltd., 1987. vii + 153 pages. ISBN: 0-906783-80-1 (hardback); 0-906783-79-8 (paperback). Author's affiliation: [Anbeek] Oce-Nederland, Venlo, Netherlands; [Daelemans] A.I. Lab, Vrije Universiteit, Brussels .

"Abstract: Most existing text processors represent text simply as a string of characters. By incorporating linguistic representations into text processing systems, the accuracy of existing text processing facilities can be improved, and new facilities can be created. The architecture of an author environment for Dutch containing several linguistic modules is described. These modules enable a text processing system to maintain and use phonological, morphological, lexical and syntactic representations. Similar modules for English are being developed."

The research reported in this article was done with the context of the ESPRIT project OS-82, sponsored by the European Community. The goal was to incorporate "linguistic" knowledge (beyond simply spell checking and hyphenation, as well as style and grammar checking) into computer-based authoring systems.



[CR: 19951226]

Anderson, Steve. "Introducing SGML into a corporate environment." In Proceedings of the Second SGML BeLux Users' Conference. SGML BeLux '95: Second annual conference on the practical use of SGML, Antwerp, Belgium. October 25, 1995. Edited by Hans C. Arents. Leuven, Belgium: Katholieke Universiteit Leuven, 1995. Author's affiliation: Project Manager, Rover Group TiMS Project, Rover Group Limited, UK. Sales & Marketing Systems Department, Block 17, Lode Lane, Solihul B92 8NW l, West Midlands, England.

Abstract: "This paper is a case study based on Steve Anderson's experiences at Rover Group, designing and building, an integrated SGML authoring, management and publication solution. Rover's TiMS system brings together leading SGML editing tools with sophisticated repository software, to support the process of authoring and managing shared corporate information in a supportive, structured environment, and to enable that information to be published in a variety of formats. Rovers system also provides full support for translated text in all formats, radically reducing the cost of managing multilingual translations."

The document is available online in HTML format: "Introducing SGML into a corporate environment" [mirror copy, December 1995]. For further details on the 1995 Conference and BeLux, see the contact information for SGML BeLux.



[CR: 19951113]

André, Jacques. "Can Structured Formatters Prevent Train Crashes?" Electronic Publishing: Origination, Dissemination and Design (EPOdd) 2/3 (October 1989) 169-173. Author's affiliation: INRIA [Institut National de Recherche en Informatique et en Automatique] / Irisa.

"Abstract: The article suggests that one contributory factor in the train crash at the Gare du Lyon in June 1988 (in which 56 died and hundreds were injured through a combination of failures in the braking system of an incoming train) was that the maintenance manuals were not only difficult to use but also contained a layout error in the part describing the train's braking system. The article makes clear how a structured document editor would have precluded such errors... "

This EPODD article is apparently a translation/revision of an earlier document; see J. Andre, "LaTeX ou SGML pouvaient-il faire éviter l'accident [catastrophe] de la gare de Lyon?"



[CR: 19951113]

André, Jacques. "LaTeX ou SGML pouvaient-il faire éviter l'accident [catastrophe] de la gare de Lyon?" Cahiers GUTenberg 1/1 (April 1989) 21-25. ISSN: 1140-9304. Author's affiliation: INRIA [Institut National de Recherche en Informatique et en Automatique] / Irisa, Campus de Beaulieu, Rennes.

For a summary, see the English version of this document.



[CR: 19951113]

André, Jacques. "Grif Plus MINT, or, How to Abide by a Layout Sheet." Pages 89-97 (with 13 references) in Protext III. Proceedings of the Third International Conference on Text Processing Systems. International Conference on Text Processing Systems. Trinity College, Dublin. 22-34 October, 1986. Edited by J. J. H. Miller. Dublin, Ireland: Dún Laoghaire, Co., Boole Press Ltd., January 1987. ISBN: 0-906783-55-0 (hardback; 0-906783-56-9 (paperback). Author's affiliation: IRISA/INRIA-Rennes, France.

"Abstract: An experience of interactively typing and editing a text according to a given layout-sheet is shown. A "what-you-see-is-what-you-get" (WYSIWYG) prototype system, Grif, has been used. Then the text and its structure have been sent to MINT to be formatted and printed according to a specific layout-sheet. This approach obliges an author, without undue constraints, to respect the style-sheet described by an editor. At the same time this case study shows the feasibility of merging the flexibility found in an abstract-oriented approach with the naturalness of document manipulation provided by WYSIWYG editors."



[CR: 19951113]

André, Jacques; Richy, Hélène. Utilisation des index d'un éditeur structuré dans le cadre d'actes médiévaux. IRISA Internal Publication [number] PI 841. Programme 3 - Intelligence artificielle, systèmes cognitifs en interaction homme-machine.. Rennes: IRISA [Institut de Recherche en Informatique et Systèmes Aléatoires], 27 juin 1994. Extent: 50 pages. ISSN: 1166-8687. Authors' affiliation: Institut de Recherche en Informatique et Systèmes Aléatoires, Campus universitaire de Beaulieu, Rennes, France.

Abstract: "Après avoir rappelé ce du'est un éditeur structuré, nous montrons comment l'éditeur (SGML) Grif traite les index de façon hypertextuelle. On décrit ensuite une expérimentation de ces index dans le cadre du Cartulaire de Geoffroy de Saint-Laurent (13th Siècle) qui montre l'utilité de cet éditeur pour les sciences humaines."

[Published abstract in English]: First, the concept of 'structured document' is reviewed. Then it is shown how the formatter Grif handles index as hypertext links. Finally, a case study is analyzed: index tables for a thirteenth-century chart has been generated with this formatter that thus proves to be a useful tool for the humanities."

The monograph illustrates how an SGML structure editor (Grif) can be used to encode manuscript data in a hierarchical manner. The document is available online in Postscript format: ftp://ftp.irisa.fr/techreports/1994/PI-841.ps.Z [mirrored copy, November 1995]. See also "Éditon structurée et indexation hypertextuelle d'actes médiévaux", Colloque Histoire et Informatique, (Rennes: Presses Universitaires de Rennes, juin 1994).



[CR: 19951113]

André, Jacques, Decouchant, Dominique; Quint, Vincent; Richy, Hélène. "Vers un atelier éditorial pour les documents structurés." Pages 63-72 with 23 references) in Congrès AFCET Bureautique, Document, "Groupware", Multimédia. Versailles: AFCET, juin, 1993. Authors' affiliation: [Jacques André] Irisa/Inria-Rennes, Campus de Beaulieu, 35042 Rennes cedex; [Dominique Decouchant] CNRS, Bull-IMAG, 2 rue de Vignate, 38610 Gières; [Vincent Quint] Inria Rhône-Alpes, Bull-IMAG, 2 rue de Vignate, 38610 Gières; [Richy] CNRS/Irisa, Campus de Beaulieu, 35042 Rennes Cedex.

"Abstract: Grif is a system dedicated to interactive preparation, modification and editing of structured documents. It is specifically designed for professional documents, such as technical documentation or commercial books. In this paper, the Grif system is first described. Then, specific problems posed by the considered documents are quoted (multi-author documents, composite documents, cooperative editing, reusability, version control, quality, etc.) and the way Grif solves these problems is exhibited. With these capabilities, Grif may be considered as an environment for 'document engineering' that provides documents with the same kind of services as [the] 'software engineering' environment provides to programs."

Abstract: "Grif est un système interactif pour la production et la consultation de documents structurés professionnels, notamment ceux de la documentation technique ou de l'édition. Après avoir rappelé les principes de base de ce système, nous citons quelques-unes des tâches et des problèmes spécifiques aux milieux éditoriaux (documents multi-auteurs, documents composites ou secondaires, travail coopératif, réutilisation, gestion de versions, qualitée, etc.) que nous comparons à ceux du génie logiciel.

Nous montrons alors comment Grif peut être vu comme une première étape vers la définition d'un 'atelier éditorial'."

Available online: ftp://ftp.imag.fr/pub/OPERA/doc/Afcet93.ps.gz [mirrored copy, November 1995]. Also available as IRISA [Institut de Recherche en Informatique et Systèmes Aléatoires] Internal Publication number 715, June 1993.



[CR: 19951113]

André Jacques; Goossens, Michel; Rolland, Christian. "Diffusion des documents électroniques: bibliographie." Cahiers GUTenberg 19 (janvier, 1995) 148-157.

The bibliography covers SGML as well as other more recent encoding and document-delivery formats. Available online in Postscript format: ftp://ftp.irisa.fr/opera/doc/bibliowww.ps.gz. Also in mirror copy.



André, Jacques; Furuta, Richard Keith; Quint, Vincent (editors). Structured Documents. The Cambridge Series on Electronic Publishing. Cambridge/New York: Cambridge University Press, 1989. vii + 220 pages, with bibliographic references and index [193-213]. 0-521-36554-6.

[Publisher's summary:] "This is a book that makes a significant contribution to the rapidly growing field of Electronic Publishing. The book is concerned with the structured representation of documents in computer document preparation systems. This approach to documents allows their logical structure to be represented and manipulated in a natural way."

"A coherent selection of papers is presented from a number of major authorities in the field: the book is unique in that the papers have been selected to form a unified whole. As well as more theoretical considerations, various current systems are discussed and compared, and standards for document representation, such as SGML, are considered. The viewpoints of the typographer and the linguist are presented, as well as that of the computer scientist. The book is based on a course organized by two of the editors at the Institut National de Recherche en Informatique, France in 1987. This book will be of interest to all concerned with document preparation systems: researchers, system designers, those concerned with standards work, publishers and printers."

This volume originated with a series of lectures given in January, 1987 at Assois (Savoy, France), organized by INRIA (Institut Nationale de Recherche en Informatique et Automatique) under the direction of Jacques André and Vincent Quint. The papers circulated in an earlier form in a booklet Structures de/for Documents, eds. Jacques André and Vincent Quint, January 1987.



Andrews, Dave. "Portable Documentation Accelerating SGML." Byte 18/3 (March 1993) 32.

The article summarizes the Interconsult market research report on the growth of SGML and discusses the implications.



Angerstein, Paula. "An Introduction to the Document Style Semantics and Specification Language (DSSSL): A Description of the DSSSL Standard and on its Status." CALS Journal (Spring 1993) 67-72.

This article is now (November 1994) partially out-of-date with respect to details in the current DSSSL draft, but it supplies a useful overview.



[CR: 19971123]

Angerstein, Paula. "Why you do (or don't) need HyTime in your document management system." Page(s) 211-216 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Author's affiliation: Senior Analyst, Texcel Research, Inc.; Email: paula@texcel.no.

Abstract: "This paper examines whether (or not) HyTime is an essential feature of a document management system. Scenarios for the appropriateness (or inappropriateness) of indirect linking are reviewed. Ways in which a document management system can help (or hinder) management of links are examined. Should (or shouldn't) a document management system treat HyTime markup as more than ordinary SGML?

"With the addition to HyTime of several annexes in the Technical Corrigendum (TC), HyTime becomes a broader framework for describing generalized SGML-based architectures. The potential impact of these far-reaching topics on document management systems is discussed."

A version of the document is available online: "Why you do (or don't) need HyTime in your document management system", by Paula Angerstein, Texcel Research, Inc. SGML Europe '97, May 14, 1997. [local archive copy.

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.



[CR: 19971227 MD: 19971229]

Angerstein, Paula. "Why Your Document Management System Should Care About Hyperlinks." Pages 195-199 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Paula Angerstein]: Senior Analyst, Texcel Research, Inc., Austin, Texas USA 78746 Email: paula@texcel.no; WWW: http://www.texcel.no/.

Abstract: "This paper examines the aspects of hyperlinking that are relevant to document management systems. Various standard mechanisms for hyperlinking -- XML, HyTime, and HTML -- are reviewed and their relative merits discussed. Ways in which document management systems can facilitate link creation, maintenance, and delivery are presented, along with their effect on integrated authoring and delivery systems."

"Traditionally, the topic of hyperlinking has been primarily discussed in the context of distribution, viewing, and display systems. This paper examines the requirements put on document management systems, and in turn, integrated authoring tools for creation and management of hyperlinks. One of the distinguishing characteristics of hyperlink management in a document management environment versus a display environment is the dynamic nature of the information to which the hyperlinks apply. Fixed linking schemes layered onto static information are no longer sufficient. Changes in information imply the need for linking mechanisms that can adapt to changed resources, ongoing validation that links are still valid, and the need for creation of additional links."

This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.

A version of the document is available online in HTML format: "Why Your Document Management System Should Care About Hyperlinks"; [local archive copy]

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19980304]

American National Standards Institute (ANSI). American National Standard for Information Processing Systems. Computer Graphics: Metafile for the Storage and Transfer of Picture Description Information. New York: The Institute, and Secretariat, Computer and Business Equipment Manufacturers Association, 1990.

ANSI/ISO 8632-1990. Approved December 20, 1990, American National Standards Institute, Inc. ANSI/ISO 8632-1992. This standard supersedes the earlier standard: CGM:1986 (ANSI X3.122-1986). For other information on CGM, see the main database entry for Computer Graphics Metafile.



Appelt, Wolfgang. "Normen im Bereich der Dokumentverarbeitung [Standards in Document Processing Activities]." Informatik Spektrum 12/6 (December, 1989) 321-330. ISSN: 0170-6012. CODEN: INSKDW. Author affiliation: GMD, Sankt Augustin, WEST GERMANY.

Discusses document processing and standards (ODA, SGML, ISO, CCITT).



[CR: 19950716]

Appelt, Wolfgang; Scheller, Angela. "HyperODA: Going Beyond Traditional Document Structures." Computer Standards & Interfaces 17/1 (January 1995) 13-21 (with 14 references). Authors' affiliation: German National Resource Center for Computer Science, St. Augustin, Germany.

"Abstract: Several extensions to the ODA Standard (Open Document Architecture (ODA) and Interchange Format) are currently jointly developed by ISO/IEC and TSS (the Telecommunication Standardization Section of the ITU, previously known as CCITT). In addition to the interchange of documents as already provided by the existing version of ODA, these extensions will allow for decentralized processing of documents. Furthermore, the conceptual model of a document is enhanced by mechanisms to describe nonlinear structures and temporal relationships which are needed for new content types such as audio. This paper describes the set of extensions commonly known as HyperODA. In addition, the relation to other hypermedia standards, namely HyTime and MHEG, is discussed."



[CR: 19950903]

Arbortext, Inc. Getting Started with SGML: A Guide to the Standard Generalized Markup Language and Its Role in Information Management. ArborText White Paper. Ann Arbor, MI: ArborText, Inc., 1995. 41K (computer file), approximately 19 pages in print copy.

"SGML allows information to be managed as data objects instead of as characters on a page. Data is broken into discrete objects of information that carry intelligence about its meaning within the overall system. SGML enables companies to store and reuse information efficiently, share it with other users and maintain it in a database. This white paper gives you an introduction to the existing SGML technology, its advantages and benefits, as well as an overview of some related standards and how they fit into an overall approach to managing information. It also defines some of the common terminology and acronyms associated with SGML."

The document is available online via the ArborText WWW server, or here in mirror copy (without graphics); or variant from SGML Open as WHITE PAPER #2001-AT. Use a WWW client that can display graphics (in color if possible) for the full effect -- this White Paper is very well done! Contents of the document: 1. The Business Challenge; 2. Unleashing the Power of Information; 3. Getting to Know SGML; 4. What Does SGML Give Me?; 5. Is SGML Right for Me?; 6. What is a Good SGML System?; 7. Who Uses SGML Now?; 8. What is CALS?; 9. Resources



[CR: 19971227]

Archie, Kent. "An SGML-Based Database Reporting Language." Pages 249-251 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Kent Archie]: Member of Technical Staff, Lucent Technologies, 1200 E. Warrenville Rd., P.O Box 3045, Naperville, IL 60566-7045 USA; Phone: +1 (630) 979-7343; FAX: +1 (630) 979-9340; Email: karchie@lucent.com.

Abstract: "To provide report program developers [with] a single output format, while allowing multiple presentation formats, we are using an SGML language based on HTML tables. The reporting programs generate the SGML documents which are translated into HTML, LaTeX or ASCII depending on the needs of the users."

The project relates to a "myriad of reports" produced for Lucent Technologies': "5ESS-200 is a large telephone switch involving the work of thousands of developers. The ADEPT project tracking system records the work items, completion dates, quality records and other information to assure the project conforms to its processes. [...] Each report program generates a complete SGML document. These are then run through the sgmls parser. The parse tree that results is analyzed by a translation program that converts the SGML tags and their contents to the appropriate statements in the presentation language. [...] We use marked sections to hide the embedded SGML from the parser when the error file is processed. We have extended our definition of reports to include printouts of the forms customers fill out on the screen. The form printing system also generates SGML which passes through the translation system to be printed. We are examining how DSSSL and XML could alter our translation mechanism."

"Our project is different than many other uses of SGML as the SGML documents are transitory and limited in nature. Once the SGML is generated and translated, it is deleted. We believe that using SGML as a report writing language provides several benefits. It is easy to understand and write, it provides independence from the presentation mechanism and results in smaller, easier to understand report code."

This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19961230]

ArborText, Inc. Native SGML vs. Filtered SGML. ArborText White Paper. Ann Arbor, MI: ArborText, 1995. Extent: approximately 15 pages.

Abstract: "SGML (Standard Generalized Markup Language) has become the world standard for exchanging information. As a result of the significant benefits of adopting SGML, many organizations are currently planning to introduce SGML in their document authoring and publishing systems. These organizations must choose between two fundamentally different approaches: native SGML and filtered SGML."

Available online: on the ArborText WWW server; [mirror copy]



Arents, Hans C.. "SGML BeLux '94 Conference." SGML Users' Group Newsletter 30 (March 1995) 5. ISSN: 0952-8008.

The article is reprinted from the SGML BeLux Newsletter 4 (December 1994).



[CR: 19950716]

Arents, Hans C. "SGML BeLux '94 Conference [version 1]." SGML Users' Group Newsletter 29 (November 1994) 6-7. ISSN: 0952-8008.

[A variant of the document in issue 30 of SGML Users' Group Newsletter.



[CR: 19951228]

Arents, Hans. "SGML BeLux '95." SGML Users' Group Newsletter 31 (June 1995) 10. ISSN: 0952-8008.

Hans Arents (KU Leuven) presents the conference program for the Second Annual Conference on the Practical Use of SGML. See the conference entry in this database, or see the contact address for SGML BeLux.



[CR: 19951226]

Arents, Hans Christian. "Using SGML on the Web." In Proceedings of the Second SGML BeLux Users' Conference. SGML BeLux '95: Second annual conference on the practical use of SGML, Antwerp, Belgium. October 25, 1995. Edited by Hans C. Arents. Leuven, Belgium: Katholieke Universiteit Leuven, 1995. Author's affiliation: Hypermedia project coordinator, K.U.Leuven, dept. MTM, W. de Croylaan 2, B-3001 Leuven, Belgium. Email: Hans.Arents@mtm.kuleuven.ac.be.

"Abstract: The amazing success of the World-Wide Web (the Web for short) as a hypermedia electronic document delivery system on top of the Internet has had a profound effect on the visibility of SGML (Standard Generalized Markup Language). Based on the use of HTML (HyperText Markup Language), the Web has become the world's largest and most successful SGML application. However, opinion remains strongly divided on whether we have to start using full-blown SGML to put electronic documents on the Web, or whether we can stick to using simple HTML. In this article I will argue that the conflict between SGML and HTML is unnecessary, since both have an important role to fulfil on the Web. At present, the most appropriate use of SGML on the Web appears to be as a "back-end" content markup language, while HTML appears to be best suited as the "front-end" presentation markup language. Only for those applications that need special functionalities not yet supported by HTML (such as intelligent search or user-specific presentation) does it make sense to use full-blown SGML on the Web."

The document is available online in HTML format: "Using SGML on the Web" [mirror copy, December 1995]. For further details on the 1995 Conference and BeLux, see the contact information for SGML BeLux.



[CR: 19971227]

Arms, William Yeo. "The Role of Text in Digital Libraries." Page 11 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [William Yeo Arms]: Vice President, Corporation for National Research Initiatives (CNRI), 1895 Preston White Drive, Reston, Virginia 20191; Email: warms@cnri.reston.va.us; Phone: +1 (703) 620-8990

Abstract: "Text has a special place in the digital library as the primary medium of human communication. While sometimes a picture may be worth a thousand words, more often, the best way to convey any complex thought is through text. The reason is simple; the richness of concepts, the detail, and the precision of ideas that can be expressed in words are remarkable. This talks looks at the role of text in digital libraries and attempts to place the various methods for managing text in context. A theme of the talk is the trade-off between generality and simplicity. Generality is the great strength of SGML and also its principal weakness. Simple formats are easier for the non-specialist to learn and easier for interoperability in distributed computing, but fail when asked to do too much. The early success of HTML is a fine example of simplicity; more recent troubles show what happens when simplicity is abandoned in a piecemeal fashion. Digital libraries are interested in both ends of the spectrum. At the high end, libraries must support every language and character set in the world, past or present. They must work with mathematics, music, chemical symbols, and special formats from every discipline. However, while accepting high-end formats, such as the encoding used by the Text Encoding Initiative, libraries ask questions about the long term [...] Unicode is emerging as the extended character set of choice, partly because the developers have been steadfast in providing compatibility with other standards, notably ASCII. The introduction of XML is paying close attention to these factors, thus greatly increasing the chance of broad acceptance."

This paper was delivered as the opening Keynote Address in the SGML/XML '97 Conference.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19951220]

Armstrong-Warwick, Susan; Thompson, Henry S.; McKelvie, David; Petitpierre, Dominique. Data in Your Language: The ECI Multilingual Corpus 1. HCRC Technical Report. Edinburgh/Genève: HCRC/Institut Dalle Molle, 1994. Extent: 12 pages. Author's affiliation: .

Abstract: "In this paper we describe the contents and the method of production of the ACL European Corpus Initiative Corpus 1 (ECI/MC1). This is a large multilingual electronic text corpus, containing 97 million words in 27 (mainly European) languages. It is available cheaply on CDROM. Most of the texts in the corpus are marked up using a fully-validated SGML document type description based upon the Text Encoding Initiative (TEI) guidelines for corpus annotation. It is hoped that this corpus will provide a useful resource for corpus-based computational linguistics."

Available in Postscript format: http://xml.coverpages.org/armstrong-ps.gz. See http://www.cogsci.ed.ac.uk/projects/hcrc.html#ECI or ECI overview.



[CR: 19980304]

Arnold, David B.; Bono, Peter R. CGM and CGI. Metafile and Interface Standards for Computer Graphics. . Berlin and New York: Springer-Verlag, 1988. Extent: xxi + 279 pages. ISBN: 0387189505. Author's affiliation: .

For other information on CGM, see the main database entry for Computer Graphics Metafile.

See also: Peter R. Bono, PC Graphics With GKS. Introduction to Graphics Standards (GKS, GKS-3D, PHIGS, CGI, and CGM and to Graphics Programming. Prentice Hall, September 1990.ISBN: 0136544355.



[CR: 19971124]

Arnold-Moore, Timothy. "Automatically Processing Amendments to Legislation." Pages 297-304 (with 25 references) in Proceedings of the Fifth International Conference on Artificial Intelligence and Law. Fifth International Conference on Artificial Intelligence and Law, College Park, MD, USA, May 21-24, 1995. Sponsored by the International Association for Artificial Intelligence & Law, and the University of Maryland Institute for Advanced Computer Studies. New York, NY: ACM, 1995. ISBN: . Author's affiliation: Multimedia Database Systems Research Group, Collaborative Information Technology Research Institute, Royal Melbourne Institute of Technology, Melbourne, Australia; Email: tja@cs.rmit.edu.au.

Abstract: "This paper proposes an architecture for a system which accepts Amending Acts expressed in SGML and produces a database of resulting versions of the Principal Acts, and describes its implementation. The paper discusses the core natural language processing module which uses an ATN to parse the components of the Acts into a frame representation of amendment actions. This representation is then used to produce database transactions which add the subsequent versions to the database."

See also the author's presentation "Automatic generation of amendment legislation," published in ICAIL, 97. Proceedings of the sixth international conference on Artificial intelligence and law, pages 56-62.



[CR: 19960812]

Arnold-Moore, T.; Fuller, M.; Lowe, B.; Thom, J. (et al.) "The ELF Data Model and SGQL Query Language for Structured Document Databases." Australian Computer Science Communications [Sixth Australasian Database Conference. ADC'95, Adelaide, SA, Australia, 30-31 January 1995. Sponsored by: Australian Compututing Society; Australian Inf. Technol. Eng. Centre; et al.] 17/2 (1995) 17-26 (with 31 references). Authors' affiliation: Department of Computer Science, RMIT, Melbourne, Vic., Australia. Home Page.

"Abstract: A data model and query language for accessing structured documents expressed in SGML is presented. The ELF (ELements with Features) model uses the SGML grammar (DTD) directly as a schema avoiding transformations which can lose information. The model also gives flexibility to the implementor to retrieve whole documents and decompose them, retrieve atomic elements and recombine them, or pursue alternatives which retrieve the elements directly. The language, Structured [sic! "Standard"] Generalized Query Language (SGQL), allows efficient access to the content, structure and attributes of documents at any level within their structure. This is all achieved with a simple, largely orthogonal functional language."

For further information on SGML-related research at RMIT, see the main entry for RMIT - MDS.



[CR: 19950716]

Arnon, Dennis. "Inaugural Meeting of Northern California SGML Users Group." <TAG> 7/1 (January 1994) 18. ISSN: 1067-9197.

The author reports on the formation of a new group. Meetings are to be held every two months. Contact: Northern California SGML Users Group, or Dr. Dennis Arnon; Frame Technology Corporation; 333 West San Carlos Street; San Jose, CA 95110; Tel: (408) 975-6377 (H) (415)752-1256; Fax: (415) 752-1827; Email: arnon@shell.portal.com; darnon@frame.com. Also: Northern California SGML User Group Board Members.



Aspen Systems Corporation. "A Solution to your Information Management Needs: The SGML Approach." Rockville, MD: Aspen Systems Corporation and Integrated Microcomputer Systems, 1986. 9 pages.



Association of American Publishers. Author's Guide to Electronic Manuscript Preparation and Markup. 2nd edition, reprinted 1989.. AAP, November 1987. ISBN: 1-55653-086-2.

Available from EPSIG.



Association of American Publishers. The Markup of Mathematical Formulas. 2nd edition, reprinted 1989. AAP, November 1987. ISBN:1-55653-083-8.

Available from EPSIG.



Association of American Publishers. The Markup of Tabular Material. 2nd edition, reprinted 1989. AAP, November 1987. ISBN: 1-55653-085-4.

Available from EPSIG.



Association of American Publishers. Reference Manual on Electronic Manuscript Preparation and Markup. 2nd edition, reprinted 1989. AAP, November 1987. ISBN: 1-55653-084-6.

Available from EPSIG.

Auto-Graphics, Inc. An Introduction to SGML. Pamona, CA: Auto-Graphic, 1995.

The document is available online from Auto-Graphics, or here in mirror copy.



Azaria, Adrienne. "SGML: A Lifesaver in a Sea of Electronic Documents." Network World 11/50 (December 12, 1994) 67.

"Abstract: The Standard Generalized Markup Language (SGML) is rapidly being adopted as an international standard for electronic document interchange. SGML lets users share information in documents across applications and computing platforms, providing a universal way to identify, manage and share document elements. This is accomplished by means of two SGML components: SGML tags, and Document Type definitions (DTD). SGML tags function as labels, identifying parts of documents, such as headlines or sections, and a tag set is a list of all allowable document elements, or objects, from items such as chapters or paragraphs. A DTD defines a document's structure, listing elements that can appear and specifying their order. SGML differs from traditional document conversion programs that preserve document formats. Instead, SGML preserves a document's content, or information, as well as a document's structure, or the relationships among the document's data."



[CR: 19961210]

Bacsich, Paul; Lefrere, Paul. "Approach to Markup at the Open University." SGML Users' Group Bulletin 4/1 (1989) 45-50 (with 7 references). ISSN: 0269-2538. Authors' affiliation: [Bacsich] Faculty of Technology, Electronic Media Research Group, Open University, Walton Hall, Milton Keynes MK7 6AA, UK.

A report on the pros and cons of SGML use during a three-year trial period.

Note: The volume editor for SGML Users' Group Bulletin 4/1 is David W. Penfold (Edgerton Publishing Services, Huddersfield, UK).



[CR: 19960203]

Bader, Winfried. "TUSTEP and SGML. Experiences from the Work of an SGML-based Concordance Edition Project." Pages 485-492 in Bible et Informatique: 'Matériel et matière,': L'impact de l'informatique sur les études bibliques. Actes du Quatrième Colloque International. . Amsterdam. August 15-18, 1994. Sponsored by AIBI (Association Internationale Bible et Informatique, en collaboration avec le Werkgroep Informatica, Vrije Universiteit Amsterdam. Paris: Honoré Champion Éditeur, 1995. ISBN: 2-85203-508-1. ISSN: 1246-7456, 0773-3968.

"[Introduction] The following paper gives a report of a concordnce project, where the computer is used as a tool for preparing the data and typesetting. The focus will be on the use of a SGML format as the central interface between the data of the old edition of the concordance, the complete text of the bible, and the new data ready for typesetting. Ending in a printed version the project demonstrates how to make the advantages of a computer fruitful for those people who never will touch such a machine."



Ballanti, Anna; Cork, Deborah; Dam, Lex van; Jonghe, Jurgen de; Herwijnen, Eric van; Nijdam, Marco; Samarin, Alexandre; Shave, Tony. "Text Processing at CERN. Part 1: Overview." SGML Users' Group Bulletin 3/2 (1988) 39-54.



[CR: 19970331]

Banfalvi, Tom; Sturgeon, Peter; Walsh, Christina L. K. "Manufacturing Documentation in the Virtual Warehouse." Pages 161-166 (with 1 reference) in Conference Proceedings, SIGDOC '96. The 14th Annual International Conference on Computer Documentation. ["Marshalling New Technological Forces: Building a Corporate, Academic, and User-Oriented Triangle"]. ISGDOC '96: 14th Annual International Conference. Research Triangle Park, North Carolina, US. October 20-23, 1996. Sponsored by the Association for Computing Machinery Special Interest Group on Documentation (SIGDOC). New York, NY: Association for Computing Machinery, 1996. ISBN: 0-89-791-799-5. Authors' affiliation: Magellan [Passport] Product Information; Northern Telecom Limited.

"Abstract: The profession of technical writing requires a variety of tools and skills to develop and deliver quality information products, such as paper documentation, online documents and training. The time we require to acquire and effectively use our development tools is constantly increasing. The time available to acquire and communicate subject matter expertise is decreasing as a result. The paper presents a strategy to free up more time for writers who want to and need to write. This model is based on specialization of functions, allowing writers to focus on writing while supporting members of the information development team provide a structure for deploying the information products in whatever form the customer requires (such as: online tutorial, quick reference card, user guide, functional specification). The writers take a more active role in directly researching customer requirements for information products. Once determined, the writers are responsible to communicate them to the support group and delegate related maintenance. The paper chronicles the evolution of an actual information development support/process/tools team, and the services that it has provided in the interest of supporting our virtual information warehouse. The paper also presents a strategy for future directions of such an information development support team and the consumers of both its products and services."

The paper discusses the proposed role of FrameMaker+SGML in the migration plans, and the group's need for an SGML-based database manager tohelp with versioning and other issues when modules are moved into SGML format.

Several other articles in this proceedings volume are germane to SGML: Betsy Brown, et al., "From Hardcopy to Online: Changes to the Editor's Role and Processes"; Paul Beam and Peter Goldsworthy, "Technical Writing on the Web-Distributed SGML-Based Learning"; Stephanie Copp, "Working with Academe"; Cindy Roposh, et al., "Developing Single-Source Documentation for Multiple Formats"; Paul Prescod, "Multiple Media Publishing in SGML"; Lin-Ju Yeh, et al., "SSQL: a Semi-Structured Query Language for SGML Document Retrievals"; Dee Stribling, et al., "A Real World Conversion to SGML".



[CR: 19980907]

Bapst, Frédéric; Ingold, Rolf. "Using Typography in Document Image Analysis." Pages pages 240-251 (with 23 references) in Electronic Publishing, Artistic Imaging, and Digital Typography. Proceedings of the 7th International Conference on Electronic Publishing (EP '98), Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, RIDT '98). EP '98 and RIDT '98, Saint Malo, France. March 30 - April 3, 1998. Edited by Roger D. Hersch, Jacques André, and Heather Brown. Lecture Notes in Computer Science Series, Number 1375. New York/Berlin/Heidelberg: Springer-Verlag, 1998. ISBN: 3-540-64298-6, and 3-540-64298-6. Authors' affiliation: Informatics Institute of the University of Fribourg, Chemin du Musée 3, Fribourg, Switzerland.

Abstract: "Even if font usage plays an important role in document image analysis (DIA), recognition systems generally take the concept of font management in a weaker sense than in the production cycle. With the point of view of the document recognition community, the authors show how typographic information (characters bitmap, metrics, etc.) can improve existing analysis methods. After a brief survey of font recognition issues, they present the advantages of a font software support in the design of recognition systems. Concrete algorithms are proposed in the subtopics of a posteriori font recognition, monofont optical character recognition (OCR), and word segmentation. The reported experiments and results indicate that there are still substantial benefits to expect from the design of typography-aware analyzers."

The CIDRE project (Cooperative & Interactive Document Reverse Engineering) for the "Development of a cooperative and interactive document recognition environment" has produced a number of publications relative to document structure and DTD generation.

See the abstract online, and the full text in PDF formatPDF, [local archive copy]; also Postscript version, [local archive copy.]



[CR: 19980501]

Bard, Ellen G.; Sotillo, Cathy; Anderson, Anne H.; Thompson, Henry S.; Taylor, M. M. "The DCIEM Map Task Corpus. Spontaneous Dialogue Under Sleep Deprivation and Drug Treatment." Speech Communication 20/1-2 (November 1996) 71-84 (with 18 references). Authors' affiliation: HCRC.

Abstract: "This paper describes a resource designed for the general study of spontaneous speech under the stress of sleep deprivation. It is a corpus of 216 unscripted task-oriented dialogues produced by normal adults in the course of a major sleep deprivation study. The study itself examined continuous task performance through baseline, sleepless and recovery periods by groups treated with placebo or one of two drugs (Modafinil, d-amphetamine) reputed to counter the effects of sleep deprivation. The dialogues were all produced while carrying out the route communication task used in the HCRC Map Task Corpus. Pairs of talkers collaborated to reproduce on one partner's schematic map a route preprinted on the other's. Controlled differences between the maps and use of labelled imaginary locations limit genre, vocabulary and effects of real-world knowledge. The designs for the construction of maps and the allocation of subjects to maps make the corpus a controlled elicitation experiment. Each talker participated in 12 dialogues over the course of the study. Preliminary examinations of dialogue length and task performance measures indicate effects of drug treatment, sleep deprivation and number of conversational partners. The corpus is available to researchers interested in all levels of speech and dialogue analysis, in both normal and stressed conditions."

See the database main entry The HCRC Map Task Corpus for details on the use of SGML encoding.

This publication is based apparently upon a paper delivered at the Workshop on 'Speech Under Stress', Lisbon, Portugal, 15-September-1995. A similar publication is: "The DCIEM Map Task Corpus: Spontaneous dialogue under sleep deprivation and drug treatment," by Bard, E., Sotillo, C., Anderson, A., and Taylor, M. M. in Proceedings of ICSLP '96 (Philadelphia 1996), pages 1958-1961.



[CR: 19961009]

Barker, L. Randol; Burton, John R.; Zieve, Phillip D. (eds.). Principles of Ambulatory Medicine, 3rd Edition. Baltimore, MD: Williams & Wilkins, 1991. ISBN: 0683004379.

The volume is a medical textbook with 102 chapters and over 700 tables and figures, with extensive bibliographies and cross-referencing. The third edition was authored by over 70 scholars at the Johns Hopkins University, in collaboration with the National Library of Medicine (NLM). It is currently produced in its print edition from an SGML PAM database, and published by Williams & Wilkins.

For further information on the use of SGML by the National Library of Medicine for database publishing, see the main entry for the NLM.



[CR: 19951122]

Barnard, David T.; Burnard, Lou; DeRose, Steven J.; Durand, David G.; Sperberg-McQueen, C. M. Lessons for the World Wide Web from the Text Encoding Initiative. Presentation for the 4th International Conference on the World Wide Web (Boston, December 1995); and, Queen's University Technical Report 95-375. [Kingston, Ontario]: [Department of Computing and Information Science, Queen's University], 1995. Extent: 32K HTML document, approximately 15 pages. Authors' affiliation: in addition to their individual institutional affiliations, the authors were all part of the TEI (Text Encoding Initiative).

Abstract: "Although HTML is widely used, it suffers from a serious limitation: it does not clearly distinguish between structural and typographical information. In fact, it is impossible to have a single simple standard for document encoding that can effectively satisfy the needs of all users of the World Wide Web. Multiple views of data, and thus multiple DTDs, are needed. The Text Encoding Initiative (TEI) has produced a complex and sophisticated DTD that makes contributions both in terms of the content that it allows to be encoded, and in the way that the DTD is structured. In particular, the TEI DTD provides a mechanism for describing hypertextual links that balances power and simplicity; it also provides the means for including information that can be used in resource description and discovery. The TEI DTD is designed as a number of components that can be assembled using standard SGML techniques, giving an overall result that is modular and extensible."

The document is available in published form in several variant versions. As a paper, it is to be presented at 4th International Conference on the World Wide Web, Boston (December 1995), and will appear in the Conference Proceedings volume published by O'Reilly and Associates. Another version of this paper was accepted for publication (February 1995) in Computer Standards and Interfaces, and was appeared earlier (March 2, 1995) as Technical Report 95-375, Department of Computing and Information Science, Queen's University (1995).

The document is available in HTML format on the Internet: "Lessons for WWW from TEI" (or http://www.qucis.queensu.ca/~barnard/teiwww.html; [mirror copy of 'wwwpaper.html', November 22, 1995]. Or, see the Postscript version of the Queen's University TR [mirror copy].



[CR: 19960408]

Barnard, David T.; Burnard, Lou; Sperberg-McQueen, C. Michael. "Lessons from Using SGML in the Text Encoding Initiative." Computer Standards & Interfaces 18/1 (January 1996) 3-10. ISSN: 0920-5489. Authors' affiliation: [Barnard] Department of Computing and Information Science, Queen's University; [Burnard] Oxford University Computing Services; [Sperberg-McQueen]University of Illinois at Chicago, Chicago. Contact address: david.barnard@queensu.ca.

[Abstract: See below]

This article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University. See abstract and further bibliographic details in a related entry immediately below.



[CR: 19960229]

Barnard, David T.; Burnard, Lou; Sperberg-McQueen, C. Michael. Lessons from Using SGML in the Text Encoding Initiative. Technical Report 95-375 [and accepted for publication in Computer Standards and Interfaces]. Kingston, Ontario, Canada: Department of Computing and Information Science, Queen's University, March 2, 1995. Extent: approximately 15 pages.

Abstract: "In April of 1994 the ACH-ALLC-ACL Text Encoding Initiative published Guidelines for Electronic Text Encoding and Interchange (Document TEI P3). SGML was used as the basis for the encoding scheme that was developed. Several innovative approaches to the use of SGML were devised during the course of the project. Three aspects of this innovation are documented in the paper. First, all of the tags are organized into sets that can be included easily in to the project DTD, which allows the corresponding features to be used in documents only when required. Second, mechanisms were developed to relate parts of documents in non-hierarchic ways. Third, a mechanism was developed to allow extension of the DTD in a disciplined manner. We comment on the effectiveness with which SGML could be used in these ways and the shortcomings we perceive."

Available on the Internet: [mirror copy]. See also above, Lessons for the World Wide Web from the Text Encoding Initiative.



[CR: 19951122]

Barnard, David; Burnard, Lou; Gaspart, Jean-Pierre; Price, Lynne A.; Sperberg-McQueen, C. M.; Varile, Giovanni Battista. "Hierarchical Encoding of Text: Technical Problems and SGML Solutions." The Text Encoding Initiative: Background and Contents, Guest Editors Nancy Ide and Jean Véronis = Computers and the Humanities 29/3 (1995) 211-231. ISSN: 0010-4817.

Abstract: "One recurring theme in the TEI project has been the need to represent non-hierarchical information in a natural way - or at least in a way that is acceptable to those who must use it - using a technical tool that assumes a single hierarchical representation. This paper proposes solutions to a variety of such problems: the encoding of segments which do not reflect a document's primary hierarchy; relationships among non-adjacent segments of texts; ambiguous content; overlapping structures; parallel structures; cross-references; vague locations."

A version of this document ("June 1994") [was] available in Postscript format from the Queen's University WWW server [as]: http://www.qucis.queensu.ca/home/barnard/achmlw18.ps (Postscript, 180 kbytes); but see now the mirror copy, November 1995. On the use of a hierarchical database to model (non-) hierarchical structures, see SGML/XML and (Non-) Hierarchy."



[CR: 19960712]

Barnard, David T.; Clarke, Gwen; Duncan, Nicholas. Tree-to-Tree Correction for Document Trees. Queen's University Technical Report 95-372 [Revision of Technical Report 91-315]. Kingston, Ontario: Department of Computing and Information Science, Queen's University, 1995. . Authors' affiliation: Department of Computing and Information Science, Queen's University.

Abstract: "Documents can be represented as ordered labelled trees. Finding the edit distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways of extending these results -- first, by changing the set of primitive editing operations used by existing algorithms and, second, by post-processing the output of the algorithms to recognize patterns of change significant to documents. Finally, we provide extensions of the first type. Our algorithm allows subtree operations but is otherwise similar to that of Zhang and Shasha."

The document is available in Postscript format on the Internet; [mirror copy]



Barnard, David T.; Fraser, Cheryl A.; Logan, George M. "Generalized Markup for Literary Texts." Literary and Linguistic Computing 3/1 (1988) 26-31.

Abstract: Encoding literary texts for analysis, electronic transmission, or publication requires the marking of various substantive, structural and formal features. The development of a standard comprehensive markup language for these purposes is a desideratum. This paper offers a set of requirements for such a language, reviews related work, and describes a newly-created standard based on the Standard Generalized Markup Language.



Barnard, David T.; Hayter, Ron; Karababa, Maria; Logan, George M.; McFadden, John. "SGML Based Markup for Literary Texts: Two Problems and Some Solutions." Computers and the Humanities 22/4 (1988) 265-276. ISSN: 0010-4817.

Abstract: There is wide agreement on the need for a markup standard for encoding literary texts. The Standard Generalized Markup Language (SGML) seems to provide the best basis for such a standard. But two problems inhibit the acceptance of SGML for this purpose. (1) Computer-assisted textual studies often require the maintenance of multiple views of a document's structure but SGML is not designed to accommodate such views. (2) An SGML-based standard would appear to entail the keyboarding of more markup than researchers are accustomed to, or are likely to accept. We discuss five ways of reducing the burden of markup. We conclude that the problem of maintaining multiple views can be surmounted, though with some difficulty, and that the markup required for an SGML-based standard can be reduced to a level comparable to that of other markup schemes currently in use.

A revision of Technical Report 204, Queen's University Department of Computing and Information Science, 1988, ISSN 0836-0227.



[CR: 19970817]

Barnard, David T; Ide, Nancy M. "The Text Encoding Initiative: Flexible and Extensible Document Encoding." Pages 622-628 (with 19 references) in Structured Information/Standards for Document Architectures. Edited by Elisabeth Logan and Marvin Pollard. = Journal of the American Society for Information Science, Special Issue. Volume 48, Number 7 (July 1997). New York: John Wiley & Sons Inc., 1997. ISSN: 0002-8231. Authors' affiliation: [Barnard]: Computing and Information Science, Queen's University, Kingston, Ontario, Canada; Email: david.barnard@queensu.ca; [Ide]: Department of Computer Science, Vassar College, Poughkeepsie, NY 12601; Email: ide@cs.vassar.edu.

Abstract: "The Text Encoding Initiative is an international collaboration aimed at producing a common encoding scheme for complex texts. The diversity of the texts used by members of the communities served by the project led to a large specification, but the specification is structured to facilitate understanding and use. The requirement for generality is sometimes in tension with the requirement to handle specialized text types. The texts that are encoded often can be viewed or interpreted in several different ways. While many electronic documents can be encoded in very simple ways, some documents and some users will tax the limits of any fixed scheme, so a flexible extensible encoding is required to support research and to facilitate the reuse of texts."

See also the bibliographic entry for Barnard and Ide, "The Text Encoding Initiative: Flexible and Extensible Document Encoding," Technical Report 96-396, Kingston, Ontario, Department of Computing and Information Science, Queen's University. December 1995. This version is available in Postscript format on the Internet.

Complete information on the Text Encoding Initiative is accessible via the main entry in the SGML/XML Web Page, or on the TEI Web Site.

See the main document entry for the complete list of articles and contributors, as well as other bibliographic information.



[CR: 19960229]

Barnard, David T.; Ide, Nancy M. The Text Encoding Initiative: Flexible and Extensible Document Encoding. Technical Report 96-396. Kingston, Ontario: Department of Computing and Information Science, Queen's University, December 1995. Extent: 24 pages, 16 references. ISSN: 0836-0227[-96-396]. Author's affiliation: [Barnard] Department of Computing and Information Science, Queen's University [Home Page]; [Ide] Department of Computer Science, Vassar College [Home Page].

Abstract: "The Text Encoding Initiative is an international collaboration aimed at producing a common encoding scheme for complex texts. The diversity of the texts used by members of the communities served by the project led to a large specification, but the specification is structured to facilitate understanding and use. The requirement for generality is sometimes in tension with the requirement to handle specialized text types. The texts that are encoded often can be viewed or interpreted in several different ways. While many electronic documents can be encoded in very simple ways, some documents and some users will tax the limits of any fixed scheme, so a flexible extensible encoding is required to support research and to facilitate the reuse of texts."

Available in Postscript format on the Internet: http://www.qucis.queensu.ca/TechReports/Reports/96-396.ps [mirror copy, February 29, 1996].



Barnard, David T.; Macleod, Ian A. Maestro Working Paper 0: An Archive of Structured Texts. Technical Report 89-262. Kingston, Ontario: Department of Computing and Information Science, Queen's University at Kingston, Kingston, Ontario, Canada, November 14, 1989. 10 pages.

Abstract: We describe a research project to create a text archive system known as MAngement Environment for Structured Text Retrieval Online (Maestro). The system combines traditional text retrieval capabilities with structural queries based on a hierarchic representation of documents, and browsing based on non-hierarchic links within a single document or among a set of documents.



[CR: 19950926]

Barnes, Julie A. Analysis of Document Encoding Schemes: A General Model and Retagging Toolset Technical Report, OSU-CISRC-7/90-TR19. Columbus, Ohio: Ohio State University Computer and Information Science, July, 1990. Extent: 69 pages.

"ABSTRACT: Many document encoding schemes and software applications to process electronically encoded documents exist today. The plethora of schemes complicates the development of applications that must access documents in more than one representation. A uniform representation of electronic documents would greatly facilitate software development. Unfortunately, the retagging of existing electronic documents is difficult, given the current development tools. The fundamental problem of distinguishing the markup from the text strings is complicated by problems such as context-sensitive markup, implicit markup, white space, and the matching of start and end tags. Lexical-analyzer generators such as Lex are based on formal models that are inadequate to handle these problems. Because of this, much of the retagging code must be written by hand. Based on a generalization of these problems, we develop a new model for textual data objects with embedded markup. The new model for textual data objects is based on the relationships between markup and text strings. The model includes four classes of markup strings: symbol, nonsymbol, implicit segmenting, and explicit segmenting tags. We propose a uniform representation called a Lexical Intermediate Form with the following lexical properties: 1) the tags are easy to distinguish from the text, 2) the tags are unambiguous, and 3) the tags are explicit. The LIF borrows its concrete syntax from the ISO standard SGML, but it is not encumbered with the SGML concept of document-type definitions. Based on the model and the proposed LIF, we identify two steps in the retagging process and develop software tools that automatically generate the code for each of these steps. Experiences using the toolset are described for six encoding schemes of varying complexity: the Thesaurus Linguae Graecae, the Dictionary of the Old Spanish Language, the Lancaster-Oslo/Bergen Corpus, the Oxford Concordance Program, WATCON-2, and Scribe. Use of the toolset represents a savings in coding effort ranging from 4.3 to 23.2 lines of code generated per line of specification in the toolset. Approximately 98 per cent of the retagging code for these encoding schemes was automatically generated by the toolset."

For a paper copy of the report, send email to: strawser@cis.ohio-state.edu, or jbarnes@cis.ohio-state.edu



Barnes, Julie A.; Mamrak, Sandra A. "A Model and Toolset for the Uniform Tagging of Encoded Documents." Electronic Publishing: Origination, Dissemination and Design (EPODD) 4/2 (June 1991) 63-85. 38 references. Author affiliation: Department of Computer Science, Bowling Green State University, OHIO, USA.

Abstract: The authors present a new, abstract model for textual data objects with embedded markup. Based on the model, they propose a uniform representation for these objects that borrows its concrete syntax from the ISO standard SGML. Such a uniform representation will greatly facilitate the development of software that analyzes, formats or otherwise processes these objects. They then describe a toolset that supports the retagging of existing encoded data objects to the new uniform representation. Their experience with the toolset demonstrates a savings of approximately 10:1 over a retagging effort without the toolset.



[CR: 19970726]

Barron, David W. Portable Documents: Problems and (Partial) Solutions." Electronic Publishing: Origination, Dissemination and Design (EPODD) 8/4 (December 1995 [appeared July 1997]) 343-367. With 37 references. ISSN: 0894-3982. Author's affiliation: Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, United Kingdom; Email: dwb@ecs.soton.ac.uk.

Abstract: "This paper presents a wide-ranging survey of the issues that arise in producing portable documents, including multimedia and hypermedia documents. It is directed at practitioners, and the approach is therefore pragmatic, based on the current state of the art; the paper does not attempt to provide a comprehensive survey of all previous work on this topic. The nature of an electronic document is discussed, and the various kinds of portability that may be required are defined. Ways in which portability can be achieved in a variety of restricted contexts are presented, including approaches to portability based on international and de facto industry standards. The likely success of the competing standards is assessed. Finally, the paper addresses the question of whether complete document portability is achievable, or even necessary." [abstract adapted from the author's Web Page: http://diana.ecs.soton.ac.uk/~dwb/papers.html ]

[Paper received February 14, 1996; revised November 6, 1996.



[CR: 19960202]

Barron, David W. "Portable Documents: Why use SGML?" Baskerville [The Annals of the UK TEX Users' Group] 5/2 (March 1995) 8-9. ISSN: 1354-5930. Author's affiliation: Department of Electronics and Computer Science, Uinversity of Southampton .

This issue of Baskerville makes available a number of papers presented at a joint meeting of the UK TEX Users' Group and BCS Electronic Publishing Specialist Group (January 19, 1995) [mirror copy]. See the link to Baskerville, or email: baskerville@tex.ac.uk. Issue 5/2 of Baskerville has other articles on SGML: "Portable Documents: Why Use SGML?" (David Barron); "Formatting SGML Documents" (Jonathan Fine); "HTML & TeX: Making Them Sweat" (Peter Flynn); "The Inside Story of Life at Wiley with SGML, LaTeX and Acrobat" (Geeti Granger); "SGML and LaTeX" (Horst Szillat). See the special bibliography page for other articles on SGML and (LA)TEX.



Barron, David. "Why Use SGML?" Electronic Publishing: Origination, Dissemination and Design (EPOdd) 2/1 (April 1989) 3-24. ISSN: 0894-3982. CODEN: EPODEU.

Abstract: The Standard Generalised Markup Language (SGML) is a recently-adopted International Standard (ISO 8879). The paper presents some background material on markup systems, gives a brief account of SGML, and attempts to clarify the precise nature and purpose of SGML, which are widely misunderstood. It then goes on to explore the reasons why SGML should (or should not) be used in preference to older-established systems. A summary of the article is also printed in "Why Use SGML," SGML Users' Group Newsletter 13 (August 1989) 10.



Bass, Randall. "The 'Jesuit Plantation Project'. Integrating Research and Pedagogy through an Electronic Archive Project in an American Studies Curriculum." Pages 13-14 [partial abstract] in Colloque International "Consensus ex Machina?". Abstracts International Joint Conference of the ALLC (Association for Linguistic and Literary Computing) and ACH (Association for Computers and the Humanities), Sorbonne, Paris, 19-23 avril 1994. Paris: Laboratorie "Lexicométrie et textes politiques" (INaLF, CNRS), and Ecole Normale Supérieure de Fontenay - Saint Cloud, 1994. 244 pages. Author Affiliation: Georgetown University.



[CR: 19950903]

Barth, Lewis M. Report on CETH Summer Seminar '94 Seminar Report. Special Electronic Issue of Jewish Studies Journal. Los Angeles, CA: Hebrew Union College -Los Angeles, Fall, 1994. Extent: 13K HTML file. Author's affiliation: Hebrew Union College-Jewish Institute of Religion, Los Angeles; Email: lbarth@bcf.usc.edu.

Abstract: "This is a summary report on my experiences at the CETH Summer Seminar (June 19-July 1, 1994) held at Princeton. The report will conclude with an assessment of the implications of the seminar for my own work on an edition of Pirkei d'Rabbi Eliezer (PRE) and possibly other areas of Judaic Studies."

Available in HTML format: CETH REPORT (1994) [or mirror copy, September 1995]. For other details on the CETH '94 Summer Seminar, see the conference entry.



[CR: 19961226]

Barth, Richard. "After SGML '96, then ... or, Setting Up an SGML User's Group a How-to Manual." Pages 659-664 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: Director of Operations, Data Conversion Laboratory, 184-13 Horace Harding Expressway, Fresh Meadows, NY 11365, USA; Tel: 718-357-8700; FAX: 718-357-8776; Email: richardbarth@dclab.com; WWW: http://www.dclab.com/.

Abstract: "The annual SGML Conference provides the opportunity to focus on technology, expand our level of knowledge, exchange ideas and experiences with others in similar or related environments. Once the week is concluded, however, we are challenged to sustain the momentum that's been attained. This does not mean that one should only wait for the next year's conference; much can be done in the interim to continue the pursuit of knowledge and exchange of information. Many communities have organized local forums, specifically designed to address these concerns. This talk will focus on some of the major issues in establishing and maintaining such an organization."

Another paper discussing the role and operation of SGML user groups was presented at SGML '96 by Holly Smith.

Note: The above presentation was part of the "And More..." track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19971123]

Bartlett, PG. "Caterpillar Inc's New Authoring System." Page(s) 155-158 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Author's affiliation: Vice President of Marketing, ArborText Inc., USA; Email: pgb@arbortext.com.

Abstract: "Caterpillar, Inc. has developed a new document information system that emphasizes the reusability of Information Elements (shared objects) in multiple documents, the automatic compilation of objects into a document, and the reusability of documents on multiple media. Based on ISO and military standards, the new information system will improve accuracy, consistency, efficiency, timeliness, and costs. This paper describes the issues that led to the system's design, pitfalls in its implementation and operation, and details the anticipated benefits."

"Caterpillar is the world's leading producer of earth-moving equipment and industrial gas turbine engines and a leading global supplier of diesel engines. Caterpillar sells over 300 products with a service life as long as fifty years or more. To support distributors in over 120 countries, Caterpillar communicates in 35 different languages. [. . .] The division implementing this new system is Caterpillar's Technical Information Division (TID) which has worldwide responsibility for producing the documentation needed to operate and service its products. TID's 300 authors and illustrators produce 800 new pages of English documentation every business day -- and Caterpillar routinely translates those 800 pages into as many as 14 languages. Today the TID group is made up of 15 groups with over 400 writers total, a total of 600 writers is expected by the end of 1997. [...]Caterpillar's New Authoring System is based on standards. They selected SGML for text and documents; TIFF, IGES, and CGM for graphics; and output specifications based on MIL-PRF-28001 for page composition. Through the use of these standards, Caterpillar was able to integrate tools from multiple vendors to support their ambitious goals.

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.



[CR: 19971227 MD: 19971229]

Bartlett, PG. "XML - What HTML Always Wanted to Be." Pages 527-532 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [PG Bartlett]: Vice President of Marketing, ArborText, Inc.; WWW: http://www.arbortext.com/.

Abstract: "In 1986, the Standard Generalized Markup Language (SGML) became an international standard for the format of text and documents. SGML has withstood the test of time. Its popularity is rapidly increasing among organizations with large amounts of document data to create, manage, and distribute. However, various barriers exist to delivering SGML over the Web. These barriers include the lack of widely supported stylesheets, complex software because of SGML's broad and powerful options, and obstacles to interchange of SGML data because of varying levels of SGML compliance among SGML software products.

"Because mainstream Web browsers lack SGML support, most applications that deliver SGML over the Web convert the SGML to HTML. This down-translation removes much of the intelligence of the original SGML information. That lost intelligence virtually eliminates information flexibility and poses a significant barrier to reuse, interchange, and automation.

"The Extensible Markup Language (XML) is being developed to enable delivery of SGML information over the Web while overcoming the limitations of HTML. The momentum building behind the XML effort means that XML is inevitably destined to become the mainstream technology for powering broadly functional and highly valuable business applications on the Internet, intranets, and extranets."

This paper was delivered as part of the "Business Management" track in the SGML/XML '97 Conference.

See 'PG Bartlett's PowerPoint presentation at SGML/XML '97 "Do You Need XML? A Checklist"'; URL ftp://ftp.arbortext.com/pub/presentations/XMLcheckpub.ppt

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19971202]

Bauman, Syd. "Keying names: The WWP Approach." Pages 17-18 in TEI 10: A Conference in Celebration of the Tenth Anniversary of the Text Encoding Initiative. Abstracts.. TEI 10: Text Encoding Initiative, Tenth Anniversary User Conference , Brown University, Providence, Rhode Island. November 14-16, 1997. Sponsored by Martin Hensel Corporation, Kluwer Academic Publishers, and MIT Press. Hosted by Brown University Library, and Computing and Information Services. Providence, RI: Brown University, 1977. Author's affiliation: Brown University.

Summary: "The presentation discusses how the Women Writers Project has implemented usage of the TEI key attribute, including the reasoning for the mechanism chosen, and the checksum algorithm used. At the WWP, names of people are encoded with <persName>; names of places are encoded with <placeName>, and names of other things are encoded with <name>[...] We encode a person's name with a particular element in order to facilitate searching for occurrences of the name. There are other possible advantages, however; e.g., allowing a brief biography to be linked to the name as it is displayed on your screen."

"It is easy to search for a consistently spelled name. Even if there are only a few clearly defined possibilities (e.g., Jon or Jonathan), searching is not so difficult. However, during the period covered by the WWP textbase, names (and many other words) were inconsistently spelled. Furthermore, many individuals have more than one name (e.g., Margaret Cavendish is also referred to as the Duchess of Newcastle). Identification of names is a particularly pertinent issue when dealing with women writers, who may have written under both married and maiden names, or under pseudonyms. People searching the WWP textbase may be looking for references to a given person, regardless of which name was used and how it was spelled. Luckily, the TEI provides a mechanism that simultaneously facilitates both searching by person and linking a brief biography to a person's name. Many TEI elements have two special attributes declared for this very purpose: reg and key." [adapted]

For more information on the name encoding, see WWP Newsletter, Fall 1996: Volume 2, Number 3: "Keying NAMEs: the WWP Approach (Expanded version)" - http://www.wwp.brown.edu/vol02num03/nameKey-home.html. Description of the The Brown University Women Writers Project is provided in the main database entry. See also the WWP Home Page (http://www.wwp.brown.edu/wwp_home.html) or overview (http://www.wwp.brown.edu/overview.html).

See the main database entry for additional information about the conference, or the Brown University web site.



[CR: 19950804]

Bauman, Syd. "Tables of Contents TEI-style." Electronic Texts and the Text Encoding Initiative [Special Issue] = TEXT Technology: The Journal of Computer Text Processing 5/3 (Autumn, 1995) 235-247. ISSN: 1053-900X. Author's affiliation: Women Writers Project, Boston University.

See the main entry for this special issue of TEXT Technology dedicated to the TEI, edited by Lou Burnard.



[CR: 19971202]

Bauman, Syd; Catapano, Terry. "TEI and the Encoding of the Physical Structure of Books ." Pages 19-25 in TEI 10: A Conference in Celebration of the Tenth Anniversary of the Text Encoding Initiative. Abstracts.. TEI 10: Text Encoding Initiative, Tenth Anniversary User Conference , Brown University, Providence, Rhode Island. November 14-16, 1997. Sponsored by Martin Hensel Corporation, Kluwer Academic Publishers, and MIT Press. Hosted by Brown University Library, and Computing and Information Services. Providence, RI: Brown University, 1977. Authors' affiliation: [Bauman]: Brown University; [Catapano]: Rutgers University.

Summary: Against the backdrop of the disclaimer in the TEI Guidelines -- "[the Guidelines] do not address the encoding of physical description of textual witnesses: the materials of the carrier, the medium of the inscribing implement, the layout of the inscription upon the material, the organisation of the carrier materials themselves (as quiring, collation, etc.), authorial instructions or scribal markup, etc." -- the authors "discuss why one might wish to encode such information,demonstrate two TEI-conformant methods for the encoding of the physical structure of a codex, and discuss possible advantages and disadvantages of both." [online abstract unavailable 971125]

See the main database entry for additional information about the conference, or the Brown University web site.



[CR: 19950925]

Bauwens, Bart; Engelen, J.; Evenepoel, F.; Tobin, C. "Increasing Access to Information for the Print Disabled Through Electronic Documents in SGML." Pages 55-61 (with 18 references) in Proceedings of the First Annual ACM Conference on Assistive Technologies. ASSETS '94. The First Annual ACM Conference on Assistive Technologies, Los Angeles, California, October 31 - November 1 1994. New York, N.Y: ACM Press, 1994. Authors' affiliation: Katholieke Univ., Leuven, Belgium.

"Abstract: There is a growing conviction that the Standard Generalized Markup Language (SGML) can play an important role as an enabling technology to increase access to information for blind and partially sighted people. This paper reports on mechanisms that have been devised to build in accessibility into SGML encoded electronic documents, concentrating on the work done in the CAPS Consortium-Communication and Access to Information for People with Special Needs, a European Union funded project in the Technology Initiative for Disabled and Elderly People (TIDE) Programme-and by ICADD, the International Committee on Accessible Document Design.



Bauwens, Bart; Engelen, J.; Evenepoel, F.; Tobin, C. "Structuring Documents: The Key to Increasing Access to Information for the Print Disabled." Pages 214-221 in Computers for Handicapped Persons. 4th International Conference [Vienna, Austria, 14-16 September 1994], ICCHP '94 Proceedings. Edited by Zagler, W. L.; Busby, G.; Wagner, R. R. Lecture notes in computer science, No. 860. Berlin, Germany: Springer-Verlag, 1994. 12 references. Authors' affiliation: Katholieke Univ., Leuven, Belgium.

"Abstract: There is a growing conviction that the Standard Generalized Markup Language, SGML, can play an important role as an enabling technology to increase access to information for blind and partially sighted people. The paper reports on mechanisms that have been devised to build in accessibility into SGML encoded electronic documents, concentrating on the work done in the CAPS Consortium-Communication and Access to Information for People with Special Needs, a European Union funded project in the Technology Initiative for Disabled and Elderly People (TIDE) Programme-and by ICADD, the International Committee on Accessible Document Design. The CAPS follow-on project, HARMONY, is briefly described.



Bauwens, Bart.; Evenepoel, F.; Engelen, J. J. "Standardization as a Prerequisite for Accessibility of Electronic Text Information for Persons Who Cannot Use Printed Material." IEEE Transactions on Rehabilitation Engineering 3/1 (March 1995) 84-89. 7 references. Authors' affiliation: ESAT, Katholieke Univ., Leuven, Heverlee, Belgium.

"Abstract: The article describes the whole field of accessible text formats for reading-impaired persons. A broad overview of existing code systems ranging from ill-defined basic ASCII up to 16- and 32-bit multilingual character sets (ISO and Unicode versions) are given, as well as details on the standardized ISO formats for structured documents (SGML and ODA). In order to underline the importance of electronic text standardization, a few current systems, both diskette and electronic mail implementations, are reviewed. Within this framework, the authors situate the activities of the ICADD committee, an international body that promotes the accessibility of text information through the use of global standards for structured texts. In Europe, the TIDE-CAPS project is mainly concerned with document access for the print-disabled. An SGML DTD for newspapers, called CAPSNEWS, has been developed; this DTD describes a fully general newspaper structure. This DTD also has some special provisions for visually impaired persons, which enables them to navigate through digital newspapers by means of large print on screen, voice synthesis, and Braille display readers. The benefits of structured document formats, both for the print-disabled and for publishers, are stressed throughout a new European Horizontal Action TIDE Program, HARMONY, which started in Autumn 1994.



[CR: 19960408]

Bauwens, Bart; Evenepoel, Filip; Engelen, Jan. "SGML as an Enabling Technology for Access to Digital Information by Print Disabled Readers." Computer Standards & Interfaces 18/1 (January 1996) 55-69 (with 7 references). ISSN: 0920-5489. Authors' affiliation: Katholieke Universiteit Leuven, Belgium. Contact address: filip.evenepoel@kuleuven.ac.be.

Abstract: "The reasons why SGML is a significant technology for the print disabled community (that is: the blind, partially sighted, dyslexics, and some with motor impairments) to access information presented in an electronic form have been extensively discussed in previous papers [in this special issue]. In this article, the authors wish to present the methods and techniques they have used to create a prototype Reading Station by which the print disabled can access SGML documents."

Several ways to create a general access technique have been investigated by the authors. The objective was to be able to give people with print disabilities access to any document marked up in SGML according to any specified DTD through a Reading Station (at that time still to be designed). Among the methods investigated were the use of attributes, the LINK mechanism, new element or entity declarations, a mechanism developed by the International Committee on Accessible Document Design (ICADD) or the use and enhancement of existing related standards such as DSSSL (draft) or HyTime."

"After careful comparison of all these techniques in relation to their applications domain, that is, braille, large print or synthetic speech, we came to the conclusion that the ICADD mechanism was a sound base. Furthermore, the ICADD technique is included in the ISO 12083 standard, which should guarantee widespread dissemination of the concept."

This article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University.



Baxter, William. "An Object-Oriented Programming System in TeX." TUGboat: The Communication of the TeX Users Group [Proceedings of the 1994 Annual Meeting] 15/3 (September 1994) 331-338. SuperScript, Box 20669, Oakland, CA 94620-0669 USA; email: web@superscript.com.

Abstract: This paper describes the implementation of an object-oriented programming system in TeX. The system separates formatting procedures from the document markup. It offers design programmers the benefits of object-oriented programming techniques. The inspiration for these macros comes from extensive book-production experience with LATEX. This paper is a companion to Arthur Ogawa's "Object-Oriented Programming, Descriptive Markup, and TeX." [See the relevant bibliographic entry.]



Beach, Richard J. Setting Tables and Illustrations with Style. Ph.D. dissertation. Waterloo, Ontario: University of Waterloo, Department of Computer Science, May 1985.

Published by the University of Waterloo, Department of Computer Science, as Technical Report CS-85-45. Also available under the same title as: Technical Report CSL-85-3. Palo Alto, CA: Xerox Palo Alto Research Center [PARC], 1985.



[CR: 19970331]

Beam, Paul; Goldsworthy, Peter. "Technical Writing on the Web-Distributed SGML-Based Learning." Pages 35-41 in Conference Proceedings, SIGDOC '96. The 14th Annual International Conference on Computer Documentation. ["Marshalling New Technological Forces: Building a Corporate, Academic, and User-Oriented Triangle"]. ISGDOC '96: 14th Annual International Conference. Research Triangle Park, North Carolina, US. October 20-23, 1996. Sponsored by the Association for Computing Machinery Special Interest Group on Documentation (SIGDOC). New York, NY: Association for Computing Machinery, 1996. ISBN: 0-89-791-799-5. Authors' affiliation: Department of English, University of Waterloo, Ontario, Canada.

"Abstract: The authors describe the components and methods for a fully interactive course in technical writing, offered through the Rhetoric and Professional Writing Program at the University of Waterloo. The term course described in the paper is available as a credit or non-credit option and its composite modules on specific writing topics can be extrapolated and presented individually for business and educational training programs. It consists of some thirteen learning modules, search tools and reference resources, the InContext 2 SGML editor, and communications software to link students with each other and their instructors with institutions and sites across the World Wide Web."

Several other articles in this proceedings volume are germane to SGML: Tom Banfalvi, et al., "Manufacturing Documentation in the Virtual Warehouse"; Betsy Brown, et al., "From Hardcopy to Online: Changes to the Editor's Role and Processes"; Stephanie Copp, "Working with Academe"; Cindy Roposh, et al., "Developing Single-Source Documentation for Multiple Formats"; Paul Prescod, "Multiple Media Publishing in SGML"; Lin-Ju Yeh, et al., "SSQL: a Semi-Structured Query Language for SGML Document Retrievals"; Dee Stribling, et al., "A Real World Conversion to SGML".



[CR: 19961030]

Beaudry, Guylaine. "La Text Encoding Initiative: les moyens pour donner de la valeur a un texte numérisé." Cursus 1/2 (printemps 1996) . ISSN: 1201-7302. Author's affiliation: EBSI-GRDS. Email: beaudryg@ere.umontreal.ca. WWW: Beaudry Home Page.

Abstract: "La Text Encoding Initiative (TEI) a comme objectif de produire un modèle général et des lignes directrices pour l'encodage sur support électronique de tous les genres de textes littéraires. Ce projet a été entrepris suite à la reconnaissance de la nécessité d'une normalisation pour l'encodage et l'échange des textes numérisés. Après avoir identifié les raisons pour lesquelles le Standard Generalized Markup Language (SGML) a été choisi pour la conception du format d'encodage, on présente les principales composantes de la Définition de Type de Document (DTD) de la TEI. Une liste de projets de numérisation utilisant cette DTD est donnée ainsi que les objectifs de la Text Encoding Initiative pour les prochaines années."

Available online in HTML format: http://mistral.ere.umontreal.ca/~beaudryg/cursus/vol1no2/beaudry.html; [mirror copy].

Note: "Cursus est le périodique électronique étudiant de l'École de bibliothéconomie et des sciences de l'information (EBSI) de l'Université de Montréal. Ce nouveau périodique diffuse des textes produits dans le cadre des cours de l'EBSI." Also, apropos of SGML work by Guylaine Beaudry: "Dans le cadre de sa maîtrise à l'EBSI, elle réalise la conversion vers SGML de deux revues savantes: Surfaces et Géographie physique et Quaternaire.



Becker, Dave. "UTF [Universal Text Format]: an SGML Standard for the News Distribution Industry." Seybold Report on Publishing Systems 24/10 (January 30, 1995) 1, 7-17. Author's affiliation: Lexis-Nexis / Mead Data Central; Email: daveb@meaddata.com.

See the related bibliographic entry for a link to the full-text version of this article on UTF (from a White Paper). UTF is the "Universal Text Format" representing an SGML application for the news distribution industry. Development of UTF is largely under the IPTC [International Press Telecommunications Council] and NAA [Newspaper Association of America]. The article in SRPS includes an overview "Probing the Universal Text Format for Newswires" (page 1).



Becker, Dave. UTF [Universal Text Format] - An SGML Standard for the News Distribution Industry. Technical Report. Mead Data Central, August 15, 1994. 18 pages (printed from a file in RTF format). Dave Becker, Mead Data Central; email: daveb@meaddata.com.

Abstract: "In June, 1992, a working subcommittee was established to create an industry standard for the interchange of textual material between news agencies and and their clients (primarily newspapers) that would replace the current standard IPTC 7901 and ANPA 1312 formats. The new standard is called the Universal Text Format (UTF). After significant discussion, SGML was adopted as the encoding language for the new standard. Members of the working subcommittee are now attempting to finalize and prototype the new standard in selected test environments. The purpose of this paper is to describe the context within which the UTF was developed, the standard itself, and plans for future development." [from the report]

The document is available online in HTML format in this SGML database.; the HTML version was derived from an RTF format (to which it should be compared for formatting accuracy) kindly supplied by Dave Becker.

Added note: "This document discusses the development of a device-independent file format for the transfer of textual information in the news industry. This file format is called the Universal Text Format (UTF). It is intended to be used with the Information Interchange Model (IIM), an electronic envelope convention for the exchange of information which has been adopted by the International Press Telecommunications Council (IPTC) and the Newspaper Association of America (NAA). The UTF is being developed under the direction of the IPTC and the NAA."



[CR: 19950925]

Bedford, J. "Electronics Letters: the SGML Implementation." Learned publishing: Journal of the Association of Learned and Professional Society Publishers 8/1 (1995) 39- [?]. ISSN: 0953-1513.



[CR: 19950716]

Beebe, Nelson H. F. A Bibliography of Publications about SGML, the Standard Generalized Markup Language. University of Utah Research Report, Version 1.17. Salt Lake City, UT: Center for Scientific Computing, Department of Mathematics, University of Utah, June 6 1995.

See pointers to Beebe's bibliography in the introductory bibliography page for this database. Author's Internet address: beebe@math.utah.edu.



[CR: 19980904]

Behme, Henning; Mintert, Stefan. XML in der Praxis: Professionelles Web-Publishing mit der Extensible Markup Language. Bonn: Addison Wesley Longman, [June 1998]. Extent: 328 pages, CDROM. ISBN: 3-8273-1330-9. Authors' affiliation: Henning Behme: iX (Zeitschrift editor), WWW http://www.heise.de/ix/editors/hb.html; Stefan Mintert: Universität Dortmund, WWW http://www.mintert.com/.

The authors have set up a Web page for XML in der Praxis with general information, examples, and errata; see the online Table of Contents, the listing of Errata, and the volume bibliography. This page also contains links for the online volume Introduction and for a German translation of the XML 1.0 specification. For examples, see :DSSSL: XML-Dokumente fürs Web formatieren and CML & CSS Examples for Mozilla.



Bellcore Information Technology Group. The Telecommunications Electronic Document Delivery (TEDD) Package. Bellcore Special Report SR-3031, Issue 1. Bellcore [Bell Communications Research], 1995. viii + 32 pages.

The document describes the TEDD package and the proposed DocID DTD. Available via FTP: ftp://info.bellcore.com/pub/TCIF/ipi_misc/sr3031draft.ps.Z. See the IPI/TCIF entry for further details on TEDD and TIM (Telecommunications Industry Markup, an application of SGML).



[CR: 19951226]

Bergh, Steven Van den; Stevens, Lauwrie. "A typesetter's tale on SGML." In Proceedings of the First SGML BeLux Users' Conference . SGML BeLux '94, Brussels. March 22, 1994. Edited by Hans C. Arents. Leuven, Belgium: Katholieke Universiteit Leuven, 1994. Authors' affiliation: Fotek Grafische Bedrijven, Entrepotstraat 3, 9100 Sint-Niklaas, Belgium.

"Abstract: Abstract This is a tale of a typesetter who has taken the SGML route for producing pages of text for its customers. Fotek Grafische Bedrijven is situated in Sint Niklaas, Belgium. A sister company has been set up in Hitchin, near London, in the United Kingdom. Another sister company is in the process of being set up near Namur. W e are typesetters, some of you may regard us as dinosaurs or something from the dark ages... We actually get paid for putting pages on paper!... Isn't that a quaint, old fashioned idea!!! The fact is, of course, that many of the organizations and companies that you, the reader, work for also derive their income directly or indirectly from producing final output, be it on paper or in some other media. We have chosen to relate our experiences of SGML from our early contacts with the standard, our examination of available tools and software (and the problems encountered), to the questions we asked ourselves and finally, our reasons for arriving at our final decision to develop our own solution aimed specifically at what we will call the 'back-end' (side) of the SGML environment."

The document is available online in HTML format: "A typesetter's tale on SGML" [mirror copy, December 1995]. For further details on the Conference and BeLux, see the contact information for SGML BeLux.



Berglund, Anders. "SGML - What is It?" Pages 1:187-194 in Proceedings of SEAS Anniversary Meeting 1985: User Friendly Computing ( Zurich 23-27 September 1985). Nijmegen: SHARE Eur. Assoc, 1985.

Abstract: SGML is intended to be the Standard Generic Markup Language providing the framework for marking up a document in a way that should be processable by products from different vendors and for different output devices.



[CR: 19950716]

Bergström, Peter. "Latest News from Swedish [SGML Users' Group] Chapter. Minutes from the meeting of the workgroup on HyTime, September 19, 1994." SGML Users' Group Newsletter 29 (November 1994) 7-8. ISSN: 0952-8008.

Report on the results of the HyTime SIG meeting.



[CR: 19950716]

Bergström, Peter. "Report from the Swedish [SGML Users' Group] Chapter." SGML Users' Group Newsletter 28 (August 1994) 19. ISSN: 0952-8008.

A report on the August 1993 meeting of the Swedish SGML Users' Group Chapter, with 130 members in attendance.



[CR: 19971227 MD: 19971229]

Bergström, Peter. "STEP and SGML Update." Pages 201-204 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Peter Bergström]: EuroSTEP AB, Drottninggatan 71D, S-111 36 Stockholm, Sweden; Phone: +46-708 111 966; FAX: +46-708 111 965; Email: peter.bergstrom@eurostep.se; WWW: http://www.eurostep.se/, and http://www.admin.kth.se/SGML/.

Abstract: "This presentation aims at giving an understanding of the work with STEP and SGML/XML integration, the reasons for it and current status of work. It begins with a presentation of the STEP standard and its parts, and discusses what the relations between STEP and SGML/XML are, and what inter-operability between the two might provide.

"A few current initiatives or projects will also be covered, with the Hägglunds LOTS project presented in somewhat more detail, being one of the more advanced STEP and SGML projects so far. Finally, a status report of the current standardization efforts within the STEP and SGML/XML communities will be given."

"STEP is an international standard, ISO 10303 'Product Data Representation and Exchange'. The former name was 'Standard for the Exchange of Product Model Data', thereby the acronym STEP. The objective with the series of standards that together are called STEP is to define a common way to describe product model information for the product's complete life-cycle, independently from the software used. [...] Today, efforts are spent to make STEP more open, i.e., to permit STEP to cooperate with other standards. An example of that is the efforts to achieve inter-operability between STEP and SGML."

This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.

For more information on STEP and SGML, see the main database entry SGML and STEP (ISO 10303 Standard for the Exchange of Product Data), the STEP/SGML Resource Page.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19950716]

Bergström, Peter [attribution?]. "Structured Information / Standards for Document Architectures." SGML Users' Group Newsletter 30 (March 1995) 9. ISSN: 0952-8008.

A call for papers on SGML and related topics for a 1996 special issue of Journal of the American Society for Information Science (JASIS). Suggested topics are: HTML, CALS, ICADD, TEI. Contact: Elizabeth Logan, logan@lis.fsu.edu



[CR: 19950716]

Bergström, Peter. "Report from Sweden '95." SGML Users' Group Newsletter 30 (March 1995) 8. ISSN: 0952-8008.

Report on a meeting of the Swedish SGML Users' Group, February 22-23, 1995, organized by KTH. Contact: Peter Bergström, L.P.H.Bergstrom@telub.se, or sgml@sunet.se.



[CR: 19971123]

Bergström, Peter; Lilja, Frank [alias]. "Business Benefits of an SGML and STEP Integration: A Drama in One Act." Page(s) 69-71 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Authors' affiliation: [Bergström]: Senior Consultant, EuroSTEP AB Sweden; [Lilja]: "President, Enator Information Management AB, Sweden".

"Enator Information Management has many years of experience in system solutions based on Enator Information Management has many years of experience in system solutions based on SGML and related standards, and carries out information analyses, information structuring, DTD design and construction as well as SGML-based system implementation.

"The necessity of using standards when trying to preserve the value of information in a changing business environment is quite well-known today, but the use of several standards simultaneously and the integration of them has not been discussed too much, even within the CALS initiative. This drama in one act will put emphasis on the benefits of integrating standards rather than choosing one of them, which in several cases is essential for success. The differences between the product model standard STEP (ISO 10303) and SGML, and thereby the strengths of each, will be illustrated by a fictive business scenario that focuses on the reasons why an integration of standards is essential for success."

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.



[CR: 19971227]

Best, Karl F. "Designing a Structured Authoring System." Pages 41-46 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Karl F. Best]: Manager, Frame Developer Support, Adobe Systems, San Jose, CA; Email: kbest@adobe.com; Phone: +1 408-536-6531.

Abstract: "This presentation examines the benefits of structured data files, compares different file formats (SGML, HTML, PDF, and XML) and their suitability for various deliveries, and discusses criteria for selection of structured authoring tools from the perspective of the user of the tools, the technical writer. The presentation is intended for people new to structured authoring who may have become interested in the topic because of the popularity of the new XML standard, and would benefit from hearing about structured authoring environments in general and how SGML, HTML, PDF, and XML fit into the picture."

This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19961226]

Best, Karl F. "Just How Many DTDs Do You Need?" Pages 131-140 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: Independent Consultant, Email: kbest@ix.netcom.com.

Abstract: "This presentation looks at using multiple DTDs for different stages in the life of a given piece of information, and examines the issues that should be taken into account when designing DTDs for a given application and deciding just how many DTDs are required.

A number of different models (e.g., a single DTD for the entire process; one DTD for authoring, another for storage, another for output, etc.) are examined, and the pros and cons for each are discussed. These considerations include the costs for each model (cost of maintaining multiple DTDs as well as the transform filters placed between them, versus the inefficiency of authoring with a single huge DTD), as well as the question of 'roll your own' versus using industry-standard DTDs."

Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19971229]

Bidoul, Stéphane. "Object Orientation and SGML: LINK Revealed." Pages 485-498 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: ACSE sa/nv, Boulevard General Wahis 29 B-1030, BRUSSELS, BELGIUM; Tel: +32 (2) 705 70 21; FAX: +32 (2) 705 81 01. Email: sbi@acse.be.

Abstract: "Several studies have tried to address the topic of Object Orientation around SGML.

The question asked was too simple and dichotomic; the answer given far too simple 'yes' or 'no'. The SGML application aspect, that is not covered by the standard, was not considered when searching for commonalities.

This paper intends to show that some application architectures coupled with an SGML parser offer an object mechanism with embedded SGML.

The relation between the parsed tokens and the application methods shows that application objects are connected to parsing objects in a simple and efficient paradigm which fully conforms to the LINK feature of the SGML language.

Adopting this view of an SGML application, makes all the facilities offered by the LINK feature suddenly self-evident and useful."

Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.

A version of the document is available online in HTML format: "Object Orientation and SGML: LINK Revealed"; [local archive copy]



[CR: 19971227 MD: 19980104]

Bidoul, Stéphane. "From Prototype to Production System. Managing the Growth." Pages 533-538 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Stéphane Bidoul]: Project Manager, SGML Technologies Group, Boulevard Général Wahis 29, B-1030 Brussels, Belgium; Email: sbi@acse.be; WWW: http://www.sgmltech.com; Phone: +32 (2) 705 70 21; FAX: +32 (2) 705 81 01.

Abstract: "SGML information systems usually come into being in the form of small-scale prototype systems supporting a few users and a relatively small set of representative documents. After a successful proof-of-concept phase comes the time of production on a larger scale where the problems encountered are of a totally different nature from those uncovered during the prototyping phase.

"This paper addresses scalability of SGML authoring and dissemination systems. An area highlighted is the need to have a set of detailed production procedures taking into account human as well as automated operations."

"SGML information systems usually come into being in the form of small-scale prototype systems supporting a few users and a relatively small set of representative documents. After a successful proof-of-concept phase comes the time of production on a larger scale where the problems encountered while growing to a full-scale production system are of a totally different nature from those uncovered during the prototyping phase. For example there are the different and sometimes contradictory constraints of the authoring and dissemination systems, which often show up only in high-volume/high-update rate conditions. This paper addresses scalability. Neglecting the more obvious aspects of scalability it highlights some issues, which are not always considered when designing complex document management systems. One aspect highlighted is the need to have a set of detailed production procedures which are adhered to in order to avoid cascading effects of incorrectly entered data, among other potential problems."

This paper was delivered as part of the "Business Management" track in the SGML/XML '97 Conference.

A version of the document is available online in HTML format: "From Prototype to Production System. Managing the Growth"; [local archive copy]. Note: The SGML Technologies Group has published a number of other interesting papers online: see http://www.sgmltech.com/papers/index.htm.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19971229]

Bidoul, Stéphane. "Reusability in SGML With Focus on Software Engineering." International SGML Users' Group Newsletter 3/3 (July 1997) 16-19. ISSN: 0952-8008. Author's affiliation: SGML Technologies Group.

The author discusses 'code reuse' (SGML application code) via SGML architectures and (SGML) LINK. Some of the ideas are elaborated similarly in the author's SGML '96 presentation, "Object Orientation and SGML: LINK Revealed."

"[Summary:] It has been shown that SGML lends itself to reuse in many ways, through which use of an SGML system can lead to great efficiency. Careful initial design, where modularity is a keyword, can affect reuse of many ingredients which go to make up an SGML application. These include the reuse of the standard method, reuse of DTDs and parts thereof as well as element names, and reuse by the design of generic applications as opposed to specific ones. SGML architectures have been shown to be an enabling technology for writing reusable applications. The reusability benefits of markup-independent applications, however, can be achieved with alternative techniques. Such a technique was presented, using the SGML LINK feature to bind an application language to a DTD."

A version of the document is available online from the SGML Technologies Group server, in HTML format; [local archive copy]



[CR: 19971123]

Biezunski. "A Topic Map for SGML 97 Proceedings: A New SGML Animal." Page(s) 335-338 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Author's affiliation: Director, High Text, Paris, France; Email: michel@hightext.com.

Abstract: "This paper explains what is a Topic Map and describes how we have made it for the current CD-ROM.

"Topic Maps are a standard representation of navigational information that is intended to be used for interchanging such devices as indexes, thesauri, glossaries, on sets of heterogeneous documents (structured or not structured). It can be thought of as the equivalent of a neutral database scheme, that should allow its users to preserve the value added on their information repositories with semantic navigation.

"Typical users of Topic Maps include SGML users who need to maintain links accross living documents, while avoiding the overhead caused by maintenance of huge amount of data as systems evolve. Note that Topic Maps can also be used if source documents are not in SGML.

"The conceptual basis of the Topic Map architecture is based on the possibility standardized by HyTime to separate the semantic information of a link from the address of the (possibly multiple) anchors. The architecture that has been designed will be updated to take into account new standard formalism being defined for links. An XML representation is planned as well.

"The Topic Navigation Maps Project is a work done under the auspices of ISO WG8, the group responsible for SGML and related standards (Convenor: James Mason). The co-editors of this project are Martin Bryan (UK) and Michel Biezunski (France).

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry. See also the dedicated section on Topic Navigation Maps in the SGML/XML Web Page, and the Topic Maps draft document(s).



[CR: 19951228]

Bingham, Harvey (editor). CALS Table Model Document Type Definition. SGML Open Technical Memorandum TM 9502:1995. Coraopolis, PA: SGML Open, October 19 1995. Extent: approximately 41 pages. Author's affiliation [Bingham]: Chair, Table Interchange Subcommittee, SGML Open; also: Interleaf.

"Abstract: This SGML Open Technical Memorandum consists of a recommendation for an update to the CALS table model DTD model that will be submitted to the appropriate CALS authority with the expectation that it will be accepted as the next revision of the official CALS table model and that a Formal Public Identifier will be assigned to facilitate referencing of this model.

"Note that the set of element and attribute declarations in the markup declaration module section of this document partially defines the CALS table model. However, the model is not well-defined without the accompanying natural language description of the semantics (meanings) of these various elements, attributes, and attribute values. The semantic writeup, in the section following that containing the markup declaration module, must be used in conjunction with the element and attribute declarations."

Available from the FTP server at Omnimark Corporation in compressed Postscript format ftp://ftp.exoterica.com/sgmlopen/9502/9502.ps.Z, [mirror copy] or in other formats (files: 9502.tar.Z, 9502pack.zip, 9502ps.zip). Also available in HTML format: SGML Open - TM 9502:1995 - "CALS Table Model DTD" [mirror copy, December 28, 1995]. Document revisions: Technical Memorandum 9502:1995; Committee Draft: 1995 August 2; Committee Draft: 1995 August 14; Final Technical Memorandum: 1995 October 19.



[CR: 19961105]

Bingham, Harvey (editor). Exchange Table Model Document Type Definition. SGML Open Technical Resolution TR 9503:1995. Coraopolis, PA: SGML Open, May 8 1996. Extent: approximately 20 pages. Author's affiliation: Chair, Table Interchange Subcommittee, SGML Open; also: Interleaf.

Abstract: "This SGML Open Technical Resolution defines an Exchange subset of the full CALS table model DTD described in SGML Open Technical Memorandum TM 9502:1995. This Exchange subset has been chosen as being a useful subset of the complete CALS table model such that, if an application's tables are tagged according to this subset, there is a high probability that the table will be interoperable among the great majority of SGML Open vendor products. See also the SGML Open Technical Research Paper TRP 9501:1995 on Table Interoperability: Issues for the CALS table model."

Available online: Exchange table model Document Type Definition. SGML Open Technical Resolution TR 9503:1995, by Harvey Bingham (Chair, Table Interchange Subcommittee). 1996 May 8. [mirror copy]. Also: Postscript version via OmnMark, [local mirror copy]; source package in tar/zip format. The declarations: set of declarations defining the Exchange Table Model; [mirror copy]



[CR: 19960904]

Bingham, Harvey W. SGML Syntax Summary [Hypertext Version]. Cambridge, MA: Bingham Associates, May 18, 1996. Extent: hypertext document of approximately 273K, in eleven HTML files.

This enhanced SGML Syntax Summary is an immensely useful tool providing indexed and linked access to SGML grammar productions. Separate listings are given for: (1) SGML Syntactic Variables; (2) SGML Keyword Syntactic Literals; (3) SGML Terminal Variables; (4)SGML Terminal Constants; (5) SGML Reference Delimiter Roles. The document will assist in the "study [of] the syntax of ISO 8879-1986 Standard Generalized Markup Language, aided by hypertext links for the syntax productions, their names, objects in their definitions, where used and where defined, and cross-references to containing clause and page:line pairs in 'The SGML Handbook', by Charles Goldfarb." The document is available online in HTML format from the canonical Web site, www.tiac.net. An authorized mirror copy of the SGML Syntax Summary is provided on the SGML/XML Web Page [authorized]. Note also the DSSSL Syntax Summary, also by Harvey Bingham. See the grammar section of the SGML/XML Web Page database for other resources.



Bingham, Harvey W. SGML Syntax Summary. Cambridge, MA: Interleaf, 2 June 1988. Extent: 46 pages.

The document [now mostly superseded by the author's enhanced SGML Syntax Summaries] supplies cross-reference information which is not given or optimally accessible in the ISO 8879 standard itself. The syntax summary covers the primary ISO document (8879), Amendment 1 (Fall 1987) and Amendment 1, Corrections (May 1988). Copies of the syntax summary were mailed to subscribers of <TAG> with issue 1/4 (1988). Copies and updates are available (originally) from Interleaf.



[CR: 19971017]

Birnbaum, David J. "In Defense of Invalid SGML." Page 14 in ACH-ALLC '97. The 1997 Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. Conference Abstracts. ACH-ALLC '97. Queen's University at Kingston, Ontario, Canada. June 3 - 7, 1997. Compiled by Greg Lessard and Michael Levison. Ontario, Canada: Queen's University, 1997. ISBN: 0-88911-760-8. Author's affiliation: Associate Professor and Chairman, Department of Slavic Languages and Literatures, University of Pittsburgh. Email: djbpitt+@pitt.edu; WWW: David J. Birnbaum Home Page.

[Excerpt]: "The requirement that SGML be valid seems in most contexts so obvious that it would never be questioned, but if document analysis of existing documents reveals violations of structure, the most appropriate model of this information in SGML terms involves invalid SGML. If the creation of invalid SGML is foreclosed for practical reasons, our most honest alternative is to enrich whatever solution we do adopt with annotations in markup that tell the truth: what we are encoding are the equivalent of parser error messages, and the fact that our document violates its basic structure in specific places is informational."

Abstract available online in HTML format: "In Defense of Invalid SGML", by David J. Birnbaum; [archive copy].

Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server.



[CR: 19961019]

Birnbaum, David J. "Standardizing Characters, Glyphs, and SGML Entities for Encoding Early Cyrillic Writing." Computer Standards & Interfaces 18/3 (June 1996) 201-252 (with 47 references). ISSN: 0920-5489 [North-Holland]. Author's affiliation: Associate Professor, Department of Slavic Languages and Literatures, 1417 Cathedral of Learning, University of Pittsburgh, Pittsburgh, PA 15260. WWW Home Page; standards page.

Abstract: "The present study discusses the differences among characters, glyphs, and SGML entities, evaluates how these distinctions might be applied to electronic text projects involving early Cyrillic materials, and proposes basic inventories of the characters, glyphs, and entities needed for computer processing of early Cyrillic written materials. None of the issues examined in the study is unique to early Cyrillic writing, and the principles elucidated here can be generalized to problems affecting the standardized encoding of other complex writing systems."

Note also Birnbaum's fine collection of WWW links to current work on character sets and standardization, accessible from his home page.



[CR: 19961019]

Birnbaum, David J.; Bojadzhiev, AAndrej T.; Dobreva, Milena P.; Miltenova, Anisava L (eds.). Computer Processing of Medieval Slavic Manuscripts. Proceedings. The First International Conference on Computer Processing of Medieval Slavic Manuscripts. Blagoevgrad, Bulgaria. July 24-28, 1995. Sofia: "Prof. Marin Drinov" Academic Publishing House, 1995. Extent: 336 pages. ISBN: 954-430-417-7.

The volume contains abstracts of the conference; several presentations focused upon encoding, and SGML-TEI encoding in particular. "The first International Conference on Computer Processing of Medieval Slavic Manuscripts, held on July 24-29, 1995 at the American University in Blagoevgrad, Bulgaria, offered an invaluable education in the state of the art and application of computer tools and software for encoding and analyzing both the text and the structure of early Slavic manuscripts, and provided Slavists with the opportunity to meet with computer science specialists to discuss ways of developing standardized tools and methods specifically for computational processing and analysis of early Slavic texts."

"The conference, which included papers by thirty-five Slavists and information science specialists from ten countries, was organized and co-directed by David Birnbaum (Department of Slavic Languages and Literatures, University of Pittsburgh), Anisava Miltenova (Institute of Literature, BAN), Milena Dobreva (Institute of Mathematics, BAN) and Andrej Bojadzhiev (Ivan Dujchev Center for Slavo-Byzantine Studies, University of Sofia), and was sponsored by grants from IREX, the ACLS Joint Committee on Eastern Europe, and the Open Society Fund." [from the online report]

Preliminary reports from the conference are available via the WWW: ; [mirror copy]. See the conference entry for further information. Note the SGML/TEI theme(emerging from the conference workshop): "A general consensus emerged among many participants at the conference that SGML, particularly in its TEI implementation, provides a framework that is potentially capable of satisfying the varying needs of different researchers, while at the same time affording a common, standardized model for document architecture and character and glyph encoding. Several conference participants engaged in significant text-encoding projects agreed to combine their efforts to adapt the TEI guidelines to the specific needs of Slavic philological research."



[CR: 19971126]

Birnbaum, David J; Cournane, Mavis. "Clothing the Emperor: Using the TEI Writing System Declaration (WSD)." Pages 27-32 in TEI 10: A Conference in Celebration of the Tenth Anniversary of the Text Encoding Initiative. Abstracts.. TEI 10: Text Encoding Initiative, Tenth Anniversary User Conference , Brown University, Providence, Rhode Island. November 14-16, 1997. Sponsored by Martin Hensel Corporation, Kluwer Academic Publishers, and MIT Press. Hosted by Brown University Library, and Computing and Information Services. Providence, RI: Brown University, 1977. Authors' affiliation: [Birnbaum]: Department of Slavic Languages, University of Pittsburgh ; Email: djb@clover.slavic.pitt.edu; WWW: http://clover.slavic.pitt.edu/~djb/; [Mavis]: University College Cork, Email: mavis@www.ucc.ie; WWW: http://www.ucc.ie/.

Summary: "The goal of the present paper is to illustrate how the authors have used WSDs to support not only the documentation, but also the transformation of TEI-conformant documents. Mavis Cournane will discuss the encoding of hellenized Hebrew and latinized Greek in a Latin context; David J. Birnbaum will discuss the encoding of the principal manuscript witnesses to the early East Slavic Rus' Primary Chronicle."

An very useful collection of materials pertaining to the TEI WSD (Writing System Declaration) is available (provisionally) at: http://imbolc.ucc.ie/~pflynn/wsd. As of November 1997, it contains all the files referenced in the Research Note Test of TEI WSD Interpretation, which accompanies the paper "Clothing the Emperor: Implementing Writing System Declarations in TEI Transformations," by Cournane [and Birnbaum] (TEI.10, Providence, RI, 1997). The research note documents a test and demonstration of the processing of Writing System Declarations (WDSs) in the Text Encoding Initiative (TEI) DTD. [research note mirror copy, November 26, 1997]

See the main database entry for additional information about the conference, or the Brown University web site.



[CR: 19961202]

Björklind, Andreas. "An Architecture for Organising Dynamic Information About Space and Time." Pages 358-365 in Knowledge Organization and Quality Management. Proceedings of the Third ISKO Conference, 1994. Conference of the International Society for Knowledge Organization, Copenhagen, Denmark, 20-24 June 1994. Frankfurt, Germany: INDEKS Verlag, 1994. ISBN: 3-88672-023-3. Library and Information Science Laboratory (LIBLAB), Department of Computer & Information Science, Linköping University, Sweden. Email: abj@ida.liu.se; or WWW: Andreas Björklind Home Page.

The article describes an architecture for handling heterogeneous information about space and time. The proposed system consists of a meta database which is handled by a metadata engine. The engine functions as the filter between the crisis management system (based on a geographical information system) and the large heterogeneous raw database. The metadata handler is a HyTime engine, based on the international standard ISO/IEC 10744. The internal query language is the HyQ [HyTime Query Language] notation from the HyTime standard.



[CR: 19961120]

Blake, G. E.; Consens, M. P.; Kilpeläinen, P.; Larson, P. A.; Snider, T.; Tompa, Frank W. "Text/Relational Database Management Systems: Harmonizing SQL and SGML." Pages 267-280 in ADB-94: Applications of Databases. Proceedings of the First International Conference. 1994 International Conference on Applications of Databases, ADB 94., Vadstena, Sweden, June 21 - 23, 1994. Edited by W. Litwin and T. Risch. Lecture Notes in Computer Science, 819. Berlin: Springer Verlag, 1994. ISBN: 3-540-58183-9. Authors' affiliation [?]: University of Waterloo Centre for the New OED and Text Research, Waterloo University, Ontario, Canada.

Abstract: "Combined text and relational database support is increasingly recognized as an emerging need of industry, spanning applications requiring text fields as parts of their data (e.g., for customer support) to those augmenting primary text resources by conventional relational data (e.g., for publication control). We propose extensions to SQL that provide flexible end efficient access to structured text described by SGML. We also propose an architecture to support a text/relational database management system as a federated database environment, where component databases are accessed via "agents": SQL agents that translate standard or extended SQL queries into vendor specific dialects, and text agents that process text sub queries on full text search engines."

Available online via the UWaterloo server; [mirror copy]. This document is superceded by a UWaterloo CS Technical Report.



[CR: 19951012]

Blake, G. E.; Consens, M. P.; Davis, I. J., Kilpeläinen, P.; Larson, P. A.; Snider, T.; Tompa, Frank W. "Text/Relational Database Management Systems: Overview and Proposed SQL Extensions. Department of Computer Science, University of Waterloo, Technical Report CS-95-25. Waterloo, Ontario: University of Waterloo Centre for the New OED and Text Research, Waterloo University, June 1995. Extent: 28 pages, 19 references.

Abstract: "Combined text and relational database support is increasingly recognized as an emerging need of industry, spanning applications requiring text fields as parts of their data (e.g., for customer support) to those augmenting primary text resources by conventional relational data (e.g., for publication control). In this paper, we propose extensions to SQL2 that provide flexible end efficient access to structured text described by SGML or other encodings. We also propose an architecture to support a text/relational database management system as a federated database environment, where component databases are accessed via "agents": SQL agents that translate standard or extended SQL2 queries into vendor-specific dialects, and text agents that process text sub-queries on full-text search engines."

Available in HTML format: Text / Relational Database Management Systems: Overview and Proposed SQL Extensions. Also available online in Postscript: http://bluebox.uwaterloo.ca/OED/trdbms1.ps; or ftp://cs-archive.uwaterloo.ca/cs-archive/CS-95-25/CS-95-25.ps.Z,[mirror copy]. Or: http://www.cssc.ca/public/trdbms1.ps. Supersedes a document previously published in Applications of Databases. Proceedings of the First International Conference, 1994 International Conference on Applications of Databases, ADB 94. See the bibliographic entry. See also the main NOED entry.



[CR: 19960312]

Blake, Joy; Elledge, Marion. "Thoughts from GCA [Tribute to Yuri Rubinsky]." <TAG> 9/2 (February 1996) 3. ISSN: 1067-9197.

This tribute is printed in a special issue of <TAG> dedicated to the memory of Yuri Rubinsky. See also the main eulogy collection.



[CR: 19950828]

Bodarky, Scott; Paisley, Scott W. An SGML DTD for the STEP Integrated Resource Parts. NISTIR [Technical Report] 5224. National PDES testbed report series. Sponsored by: U.S. Department of Defense, CALS Evaluation and Integration Office, the Pentagon. Gaitherburg, MD and Springfield, VA: U.S. Department of Commerce, National Institute of Standards and Technology, July 08, 1993. Extent: iv + 31 pages, bibliography.



[CR: 19950716]

Böhm, Klemens; Aberer, Karl; Neuhold, Erich. Administering Structured Documents in Digital Libraries. Pages 91-117 with 29 references [this version?]) in Advances in Digital Libraries, edited by Adam, N. R.; Bhargava, B. K.; Yesha, Y. Lecture Notes in Computer Science. Berlin/New York: Springer Verlag, 1995. Authors' affiliation: GMD-IPSI, Darmstadt, Germany.

"Abstract: In this chapter we argue that hyperdocuments administered by digital libraries have to be structured according to standardized storage and exchange formats in order to allow for the manipulation functionality required in digital library construction, maintenance and use. We demonstrate how SGML and its extension, HyTime, can play this structuring role, and how multimedia documents structured accordingly can be stored, changed and maintained in the object-oriented database system VODAK. Using the dynamic semantic extension facilities of VODAK it is illustrated how the document structuring dynamics offered by SGML and HyTime can be accommodated in the database. In addition, we discuss how this facility can be combined with other system components to provide a relevant portion of the functionality required for digital libraries.

[Another] "Abstract: The authors have observed that both a document's internal structure as well as the relations between documents should be properly reflected when documents are stored in digital libraries. The phenomenon that the dividing line between inter-document relationships and intra-document relationships is not clean-cut has been hinted at by means of examples, such as the WWW. Describing documents according to their internal structure is advantageous with regard to the various services offered by digital libraries. Within our database application framework documents of arbitrary types can be handled. Querying documents according to their structure is an issue currently attracting researchers' attention. The queries given there can also be formulated using the VODAK Query Language VQL. Some sample queries have been formulated to sketch the expressive power of VQL. We have briefly explained how to amalgamate the system with the extension toward HyTime semantics. Besides that, the coupling with other modules for enriched functionality has been explained. The systems that have been mentioned in this context are an information-retrieval system, knowledge bases and a DFR archive. An interface to the WWW has also been discussed."

The document is available in Postscript format as P-94-25.ps.Z from the GMD-IPSI FTP server. Also in mirror copy, October 1995].

Apparently: Selected Papers from Digital Libraries Workshop DL' 94. Newark, NJ, USA, 19-20 May 1994.



Böhm, Klemens; Aberer, Karl; Hüser, Christoph. "Extending the Scope of Document Handling: The Design of an OODBMS Application Framework for SGML Document Storage." GMD-IPSI Technical Report. GMD-IPSI [Integrated Publication & Information Systems Institute], 1994. 17 pages. Email contact: kboehm@darmstadt.gmd.de.



Böhm, Klemens; Aberer, Karl; Hüser, Christoph. "Introducing D-STREAT: The Impact of Advanced Database Technology on SGML Document Storage." <TAG>: The SGML Newsletter 7/2 (February 1994) 1-4. ISSN: 1067-9197. Authors' affiliation: Authors' affiliation: Integrated Publication & Information Systems Institute, Darmstadt, Germany. Email contact: kboehm@darmstadt.gmd.de.

"Abstract from the public copy?]: Based on our experience with a database application for SGML-document storage based on a relational DBMS, the weaknesses of this paradigm with regard to the storage of strctured documents are discussed. In addition, we are aware of more sophisticated requirements on document storage. We are currently working on a new system based on object-oriented technology. One the one hand we try to illustrate how database technology can be applied on document handling. Settings in which document bases are imperative are described. On the other hand, an objective of this article is to point out that authoring and publishing are a provenance of requirements for next-generation DBMSs."

Abstract from online version: "

A draft version of the document [previously submitted to EPODD ?] is also available in Postscript format as P-94-06.ps.Z from the GMD-IPSI FTP server. Draft version mirror copy, October 1995].



[CR: 19971113]

Böhm, Klemens; Aberer, Karl; Neuhold, Erich J; Yang, Xiaoya. Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM. GMD-IPSI Technical Report. Darmstadt, Germany: GMD-IPSI, 1997. Extent: 52 pages, 32 references. Authors' affiliation: GMD-IPSI [Integrated Publication and Information Systems Institute], OASYS (Open Adaptive Information Management Systems); WWW: http://www.darmstadt.gmd.de/oasys/locate.html/~kboehm; Email: kboehm@darmstadt.gmd.de.

Abstract: "The combination of SGML and database technology allows to refine both declarative and navigational access mechanisms for structured document collection: with regard to declarative access, the user can formulate complex information needs without knowing a query language, the respective document-type definition or the underlying modelling. Navigational access is eased by hyperlink-rendition mechanisms going beyond plain link-integrity checking. With our approach, the database-internal representation of documents is configurable. It allows for an efficient implementation of operations, because DTD knowledge is not needed for document structure recognition. We show how the number of method invocations and the cost of parsing can be significantly reduced."

The document is available online in Postscript format: ftp://ftp.darmstadt.gmd.de/pub/dimsys/reports/P-97-12.ps.Z; [local archive copy]. It was also accepted for publication in VLDB Journal, Volume 6, Issue 4, 1997



Böhm, Klemens; Aberer, Karl. An Object-Oriented Database Application for HyTime Document Structure [ or: Storing HyTime Documents in an Object-Oriented Database]. GMD-IPSI Technical Report. Sankt Augustin: GMD [Gesellschaft fü Mathematik und Datenverarbeitung]-IPSI, 1994. 8 pages, with 28 references.

Abstract: An open hypermedia document storage system has to deal with requirements that are not satisfied by existing systems. It has to support non-generic hypermedia documents, which are enriched with application-specific semantics. It has to provide the typical hypermedia document access methods. And it has to allow exchange of hypermedia documents with other systems. We use on a technical level an object-oriented database-management system and on a logical level a well established ISO standard, namely HyTime, in order to satisfy the requirements mentioned. At the example of documents which incorporate hypertext structures we discuss in this paper the impact of taking such an approach on representation and processing within the database system.

Available in Postscript format as P-94-09.ps.Z from the GMD-IPSI FTP server.



[CR: 19971113]

Böhm, Klemens; Aberer, Karl; Klas, Wolfgang. "Building a Hybrid Database Application for Structured Documents." GMD-IPSI Technical Report. Darmstadt, Germany: GMD-IPSI [Integrated Publication & Information Systems Institute], 1997. 26 pages. Authors affiliation: [Böhm, Aberer]: GMD-IPSI; [Klas]: Universität Ulm; Email contact: kboehm@darmstadt.gmd.de.

Abstract: "In this article, we propose a database-internal representation for SGML-/HyTime-documents based on object-oriented database technology with the following features: documents of arbitrary type can be administered. The semantics of architectural forms is reflected by means of methods that are part of the database schema and by the database-internal representation of HyTime-specific characteristics. The framework includes mechanisms to ensure conformance of documents to the HyTime standard. Measures for improved performance of HyTime operations are also described. The database-internal representation of documents is a hybrid between a completely structured and a flat representation. Namely, the structured representation is better to support the HyTime semantics, and modifications of document components. On the other hand, most operations are faster for the flat representation, as will be shown."

The document is available online in Postscript format: ftp://ftp.darmstadt.gmd.de/pub/dimsys/reports/P-97-13.ps.Z; [local archive copy]. A version of this paper was published in Multimedia Tools and Applications 5 (1997) 275-300 [Kluwer Academic Publications].



Böhm, Klemens; Rakow, T. C. "Metadata for Multimedia Documents." SIGMOD Record 23/4 (December 1994) 21-26. 19 references. Authors' affiliation: Integrated Publication & Information Systems Institute, Darmstadt, Germany. Email contact: kboehm@darmstadt.gmd.de.

Abstract: Metadata for multimedia documents are classified in conformity with their nature, and the different kinds of metadata are brought into relation with the different purposes intended. We describe how metadata may be organized in accordance with the ISO standards: SGML, which facilitates the handling of structured documents, and DFR, which supports the storage of collections of documents. Finally, the authors outline the impact of their observations on future developments.

Available in Postscript format as P-94-24.ps.Z from the GMD-IPSI FTP server.



Böhm, Klemens; Müller, Adrian; Neuhold, Erich. "Structured Document Handling - a Case for Integrating Databases and Information Retrieval." Pages xxx-xxx (ca 10 pages in) Proceedings of the 3rd ACM International Conference on Information and Knowledge Management (CIKM '94). Maryland, November 1994. Berlin/New York: Springer Verlag [or ACM Press?], forthcoming [1995].

"Abstract: In this paper we discuss the structured multimedia documents that will be, or already are, to some degree the communication backbone of the so-called superhighways. It will be shown that storage and retrieval of such documents will best be handled by an integration of database and information retrieval technologies. We assume documents to be structured with the help of standards like SGML/HyTime and represented by the multitude of formats currently used for multimedia data.

Starting with an approach based on object-oriented database technology we extend both their functionality on the cost models for query evaluation on one side with multimedia features and on the other with logic-based models of information retrieval to truly combine structure and content information about the documents in question."

Available in Postscript format as P-94-26.ps.Z from the GMD-IPSI FTP server. [Mirror copy available, October 1995].



[CR: 19960826]

Boeri, Robert J.; Hensel, Martin. "SGML Refineries: Distilling 'Docubases' for CD-ROM and Online Delivery." CD-ROM Professional 9/8 (August 1996) 52-53. ISSN: . Author's affiliation: [Boeri:] Information Services Division, Factory Mutual Engineering; [Hensel:] Founder, Martin Hensel Corporation.

"Abstract: The information explosion and larger CD-ROM storage capacity make it more difficult for information systems managers and other data professionals to distill data, publish on a variety of media and direct it to a narrow market. There is no ideal solution to docubase management, but there are more solutions than ever. A new type of SGML-capable docubase system, such as Inforium LivePage, offers several advantages."



[CR: 19950716]

Boeri, Robert J.; Hensel, Martin. "What Good is SGML?" CD-ROM Professional 8/4 April, 1995 108-110. Authors' affiliation: [Boeri:] Advanced Systems Specialist in the Information Services Division, Factory Mutual Engineering, Norwood, Massachusetts; [Hensel:] Founder, Martin Hensel Corporation.

"Abstract: It took over a decade for the Standard General Markup Language (SGML) to be developed before it was ratified as an ISO standard in 1986. Nearly another decade later, it is still not widely used. SGML has been held back because of lack of understanding, but it could be very useful to CD-ROM publishers. SGML can provide significant short-term paybacks, if it is approached correctly. Much of the profit from early CD-ROM titles was siphoned off in redundant operations, where the ad hoc nature of many of these operations resulted in high costs for small changes. Even though service bureaus generally did excellent work, using them added an extra link to the publishing chain. CD-ROM production was completely separate from print production and therefore was not economical. SGML provides a single source from which many products can be derived. This consolidation reduces or eliminates conversion costs, speeds time-to-market for new products, and improves editorial consistency. Most importantly, SGML facilitates flexible repackaging of data to serve different market segments."

[According to Jim Marchand] The article "makes a strong case for the use of SGML in CD-ROM publishing." Set against the backdrop of answering the joke: "SGML: Sounds Good, Maybe Later."



[CR: 19951113]

Boivin, Laurent. Étude de réalisation d'un gestionnaire de versions de documents structurés. IMAG Technical Report [ENSIMAG U.F.R. Informatique & Mathématiques Appliquées I.M.A.G. DEA D'INFORMATIQUE. Effectué au laboratoire: Unité mixte Bull Imag/Systèmes projet Opéra]. Grenoble: INRIA/Bull-IMAG, 20 juin, 1994. xii + 89 pages.

Abstract: "Ce rapport analyse et propose des solutions pour offrir un outil de gestion de versions de documents structurés.

Dans le domaine de l'édition électronique, les documents structurés prennent une importance de plus en plus grande. Ces documents sont représentés par leurs composants logiques (titres, sections, paragraphes, notes, etc.) et les relations (inclusions, ordre, référence, etc.) entre ces composants. De plus, les types de composants disponibles et leurs relations possibles sont définis par un 'schéma de structure' pour chaque classe de documents.

"La gestion de versions de documents structurés consiste à gérer les relations entre les diverses instances d'un document. Ces instances sont des étapes intermédiaires retraçant l'évolution du document durant tout son cycle de vie. Un autre aspect de la gestion de versions, appelé gestion de configurations ou gestion de contextes, concerne les relations entre les documents et surtout entre les instances de différents documents (les liens hypertexte).

Les opérations principales impliquées dans la gestion de versions sont l'identification, la comparaison et la fusion de documents, la création de fichiers de différences et la reconstruction d'une version, la gestion de configurations, et la gestion de l'évolution de schémas de structure.

"Dans une première partie (chapitre II), une analyse des problèmes posés par la gestion de versions de documents structurés est présentée. Cette analyse est accompagnée d'un état de l'art de ce qui se fait sur le sujet dans d'autres domaines (le génie logiciel, les hypertextes et les bases de données).

Une deuxième partie (chapitre III) propose des solutions pour développer un environnement de gestion de versions aussi complet que possible. Ces propositions, indépendantes de toute implémentation, sont ensuite adaptées (chapitre IV) dans l'environnement de l'éditeur de document structuré Grif dans le but de mettre en place un système de gestion de versions opérationnel."

Available in electronic format via the WWW: ftp://ftp.imag.fr/pub/OPERA/doc/GestionVersions.ps.Z [mirrored copy, November 1995].



[CR: 19971227 MD: 19971229]

Bonhomme, Patrice; Cruz-Lara, Samuel; Romary, Laurent. "The SILFIDE Network: An Interactive Service for Using, Studying, Distributing and Sharing Natural Language Resources." Pages 161-169 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Patrice Bonhomme]: Computer & Linguistic Expert Engineer, CRIN-CNRS / INRIA Lorraine, France; Email: Patrice.Bonhomme@loria.fr; [Samuel Cruz-Lara]: Assistant Professor at Nancy 2 University's Institut of Technologie, Computer Science Department. CRIN-CNRS / INRIA Lorraine, France Email: Samuel.Cruz-Lara@loria.fr; [Laurent Romary]: Computational Linguistic Researcher at CNRS CRIN-CNRS / INRIA Lorraine, France; Email: Laurent.Romary@loria.fr.

Abstract: The purpose of this paper is to present some of the issues involved in taking advantage of the current advances in Web new technologies, in the aim of distribute linguistic resources in an opened client/server environment. The paper is organized as follows: First, we describe our experiment within the first season of the SILFIDE (Serveur Intéractif pour la Langue Française, son Identité, sa Diffusion et son Étude) Server Project currently under development at the CRIN (Centre de Recherche en Informatique de Nancy) a laboratory associated with the CNRS (Centre National de la Recherche Scientifique) and INRIA Lorraine (Institut National de Recherche en Informatique et en Automatique). We developed a first SILFIDE Server prototype implementing the TEI guidelines, the CGI (Common Gateway Interface) and Java technologies. Next, we sketch the new directions within a second season of the SILFIDE Server Project concerning new related topics: (1) managing linguistic resources encoding in XML/TEI, (2) distributing linguistic resources over a SILFIDE network using the possibilities given by new technologies for a Web information server and (3) integrating and standardizing linguistic tools in a distributed environment."

This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.

See the main database entry for Project Silfide (Serveur Interactif pour la Langue Française, son Identité, sa Diffusion et son Étude)

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19960907]

Bonhomme, Stéphane; Roisin, Cécile. "Interactively Restructuring HTML Documents." Computer Networks and ISDN Systems 38/7 (May 1996) 1075-1084 (with 19 references). Authors' affiliation: INRIA -- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE (The French National Institute for Research in Computer Science and Control). Postal Address: Unité de Recherche INRIA Rhône-Alpes, 655 avenue de l'Europe F-38330, Montbonnot Saint Martin, France. Email [Stéphane Bonhomme (Inria)] Stephane.Bonhomme@inrialpes.fr; [Cécile Roisin (Grenoble University)] Cecile.Roisin@inrialpes.fr.

"Abstract: When editing World Wide Web pages, a user may desire to transform the documents as freely as with a word processor, but because Web documents must conform to a rigorous structure, defined by the grammar of the HTML (HyperText Markup Language) document type definition (DTD), not all such transformations are allowed: the editing system must perform some work to obtain valid HTML documents. This paper presents a solution to the problem of transforming the document structure in a HTML editor. A tool based on a transformation language is described. Techniques that have been designed for general structured documents have been adapted to take into account the specific structure of the HTML DTD."

Apparently a version of (or being) a paper delivered at the Fifth International World Wide Web Conference, Paris, France, 6-10 May 1996. Available online: http://www5conf.inria.fr/fich_html/papers/P16/Overview.html; mirror copy, text only. Or: http://opera.inrialpes.fr/OPERA/Papers/WWW96/WWW96.html. Or see the list of slides - http://www5conf.inria.fr/fich_html/slides/papers/PS6/P16/overview.htm from the presentation.; alternately, the slide sequence as a single document: http://www5conf.inria.fr/fich_html/slides/papers/PS6/P16/all.htm. See: other INRIA papers.



[CR: 20000229, 19970128]

Borgida, Alexander. "Features of Languages for the Development of Information Systems at the Conceptual Level." IEEE Software 2/1 (January 1985) 63-72 (with 17 references). ISSN: 0740-7459. Author's affiliation: Department of Computer Science, Rutgers University, Piscataway, NJ 08855, USA. Phone: +1 (908) 445-4744; Fax: +1 (908) 445-0537; Email: borgida@cs.rutgers.edu; WWW: Home Page.

Abstract: "A computer system which stores, retrieves and manipulates information about some portion of the real world can be viewed as a model of that domain of discourse. There has been considerable research recently on languages which allow one to capture more of the semantics of the real world in these computerized Information Systems -- research which has variously been labelled as Semantic Data Modeling, Semantic Modeling or Conceptual Modeling. This review paper presents a list of the features which appear to distinguish these languages from those traditionally used to describe and develop database-intensive applications, and considers the motivation for these features as well as the potential advantages to be gained through their use. The paper, which is intended for those familiar with current data processing practices, also compares in greater detail four programming languages which incorporate semantic modeling facilities, and discusses some of the methodologies and tools for Information System development based on these languages." [Note: Alex Borgida's work in recent years [1999/2000] focuses on description logics - also worth thinking about.

Summary: "In this paper we wish to survey several languages which purport to allow the description of an IS in a manner which models the real-world enterprise more naturally and directly than has been the case traditionally. The goal of this approach is to facilitate: (a) the design and maintenance of the IS, by adopting a vocabulary which is more appropriate for the problem domain, and by structuring the IS description as well as the description process; (b) the use of the IS, by making it easier for the user to interpret the data stored, and thus obtain information. The remainder of the paper is structured as follows: We first summarize some problems with traditional IS development languages in Section 2. Then, in Sections 3 and 4, we present some of the facilities for modeling the static and dynamic aspects of an enterprise which distinguish Conceptual Modeling Languages (CMLs henceforth). Finally, we consider some aspects of novel methodologies for IS development which are based on the use of CMLs, as well as computer tools supporting them." See the extended excerpt from sections 1 and 7, and the reference document "Conceptual Modeling and Markup Languages." See also cache, PDF format.

[Note to SGML readers: Historically, critics of SGML have pointed out that SGML's notion of "attribute" is so weak as to be worthless for all but the most trivial kinds of databases -- even if it is judged adequate for some processing tasks related to book production. SGML attributes cannot directly model the complexity of object-attribute information in the real world because SGML's attributes store essentially flat strings -- not complex information structures. Of course, complex attributes can be modeled in SGML using SGML element markup (e.g., by employing a reserved name convention like "<ATT-xxxx>" for attributes masquerading as elements) -- at the expense of losing what minimum semantic validation SGML's attribute mechanism does offer. Arbitrarily complex information can be stored as CDATA within SGML's attributes, using private syntaxes, but the encoding cannot be validated for integrity by SGML. The present article by A. Borgida highlights the alternative, as a key insight from the object-oriented database world: the benefit of modeling complex relationships as attributes, where objects and attributes are fundamentally different notions, each with structural complexity.]

A slightly earlier version of the article is available online: ftp://cs.rutgers.edu/pub/borgida/CML-features.ps.gz; [mirror copy]. The article "is based upon an unpublished paper presented at the 1st Colombian AUC Conference, Medellin, Colombia, September 1982, and the work has been partially supported by the National Science Foundation under Grant No. MCS-82-10193." See also other publications by Alexander Borgida.



[CR: 19961030]

Bos, Bert. Toelichting bij de beschrijvingen van SGML-software [Comment about the descriptions of SGML-software]. PREMIUM Project Report. Groningen, Netherlands: PREMIUM Project, 1995. Extent: approximately 9 pages.

Available [in Dutch and English] on the Internet in HTML format [March 1996]: Toelichting bij de beschrijvingen van SGML-software [mirror]. In English: http://www.nic.surfnet.nl/surfnet/projects/premium/premium.eng/comment.html; [mirror copy]. See also [in Dutch] the more detailed product description from PREMIUM.



[CR: 19971107]

Bos, Bert. "XML. From Bytes to Characters." Pages 165-176 in XML: Principles, Tools, and Techniques. Guest Edited by Dan Connolly. World Wide Web Journal [edited by Rohit Khare] Volume 2, Issue 4. Sebastopol, CA: O'Reilly & Associates, Fall 1997. Extent: xxii + 248 pages. ISBN: 1-56592-349-9. ISSN: 1085-2301. Author's affiliation: INRIA, W3C (Internationalization).

Abstract: "XML is a syntax for storing hierarchically organized data such as directories, catalogues, user manuals, etc. It can store only textual data, but that is not a severe restriction. This article defines, in some detail, how text is stored in an XML file. It also describes how an XML file is encoded for transportation over the Internet, and upon arrival, decoded again. Under the Internet model for transport of text files, the encoding/decoding may result in a 'different' file (i.e., a different sequence of bytes), but retains exactly the same text and structure."



[CR: 19961226]

Bosak, Jon. "The Case for DSSSL Online (dsssl-o)." Pages 665-666 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: Online Information Technology Architect, SunSoft, 2550 Garcia Ave., MPK17-101, Mountain View, CA 94043, USA. Email: bosak@atlantic-83.Eng.Sun.COM.

Session Abstract: "The DSSSL (Document Style Semantics and Specification Language) Online session will consist of a 45 minute orientation session followed by two or more hours of interactive discussion and a demonstration of Jade, a DSSSL engine. Since the basic motivation behind dsssl-o is the application of semantics to generic SGML documents served out over the Internet, some time will be spent reviewing the case for SGML on the Web and the need for semantic specification methods beyond those being currently developed for HTML before presenting the Application Profile itself.

It is assumed, but not required, that session participants will have already gained some familiarity with the DSSSL standard. The DSSSL tutorial on Sunday, November 17, is highly recommended for persons planning to attend the DSSSL Online workshop."

Further information on DSSSL Online may be found: (1) in the DSSSL entry of the SGML/XML Web Page, or (2) on the SGML Open Web site ("The Case for DSSSL Online," by Jon Bosak).

Note: The above presentation was part of the "And More..." track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19990129]

Bosak, Jon. "Media-Independent Publishing: Four Myths About XML. [The W3C's Working Group Chair Dispels Some Myths About XML.]." IEEE Computer 31/10 (October 1998) 120-122. ISSN: . Author's affiliation: Sun Microsystems; and Chair, W3C XML Coordination Group.

Introduction: "Called 'the emerging technology of the year' after it was endorsed by the World Wide Web Consortium (W3C), XML burst onto the scene in February [1998]. It was called the successor to HTML and, according to some, the future lingua franca for the exchange of structured data. As XML emerged from the obscurity of its W3C beginnings, it was perhaps inevitable that this new data format would begin generating misconceptions as fast as it attracted enthusiasts. In this column, I'd like to head off some myths about XML before they become permament misunderstandings." The four myths: (1) XML is a Conspiracy Led by Microsoft. In fact, XML has the support of broad base of corporate sponsors who has worked toward a "genuinely open standard" based upon a common perception of needs for markup extensibility, structured data, formal validation mechanisms, media independence, and vendor and platform independence. (2) XML is an Extension of HTML. In fact, XML belongs to the SGML conceptual layer - being a metalanguage rather than to the HTML layer, since HTML is a particular markup language governed by its own DTD. (3) XML Can Drive Web Browsers by Itself. In fact, XML specifies syntax, not processing semantics. In order for a browser to do its work with XML (rendering, hypertext linking, etc.), "you will have to supply both the content of the document (expressed in XML) and its treatment, whichyou must specify either programmatically (with scripts) or declaratively (with style sheets). (4) XML is Just for Data. In fact, "the first wave of XML applications is based on what it can do on its own: convey structured data."

In addition to the online version, see the PDF source for the official publication of the article in IEEE Computer. For other introductory articles on XML, see "Introducing the Extensible Markup Language." [local archive copy]



[CR: 19960818]

Bosak, Jon. An SGML-Based Web Server. Presentation at the Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France. Domaine de Voluceau - Rocquencourt: INRIA, May 1996. Author's affiliation: Sunsoft.

"[section extract]The case for generic SGML on the Web:

  • SGML-to-HTML servers are complex and CPU-intensive; only large corporations can afford them. By shifting more of the processing load to the client, generic SGML browsers can deliver many of the advantages of structured documents at a much lower cost.
  • Structured SGML provides the basis for presentational controls far beyond what can be accomplished with any current form of HTML.
  • Generic SGML allows for the transmission of much richer data to client-side applications (especially Java applets); "SGML gives Java something to do."
  • Generic SGML is rich enough to support distributed document processing (not just distributed document rendering); HTML is not.
  • Generic SGML is required whenever structured data from a database system must be processed at the client before transmission to some other database system.
  • The interchange format must be capable of capturing all of the information in the source and conveying it to the target. HTML cannot do this.

Available on the Internet: http://www5conf.inria.fr/fich_html/slides/dday/sgml/all.htm [mirror copy, text only], or the presentation slides from the conference.



[CR: 19970710]

Bosak, Jon. Overview: XML, HTML, and All That. Paper presented at WWW6 (WWW '97). Mountain View, CA: Sun Microsystems, April 11, 1997. Extent: approximately 30 pages, 47 slides. Author's affiliation: Sun Microsystems.

The presentation compares and contrasts SGML, HTML, and XML; DSSSL and/versus CSS. On the style language comparison: see another contribution by the author, registered in the stylesheet section of the SGML/XML Web Page.

Available online: "Overview: XML, HTML, and all that". [Archive copy .ZIP, or overview document text only].



[CR: 19980306]

Bosak, Jon. "XML, Java, and the Future of the Web." Pages 219-227 in XML: Principles, Tools, and Techniques. Guest Edited by Dan Connolly. World Wide Web Journal [edited by Rohit Khare] Volume 2, Issue 4. Sebastopol, CA: O'Reilly & Associates, Fall 1997. Extent: xxii + 248 pages. ISBN: 1-56592-349-9. ISSN: 1085-2301. Author's affiliation: Sun Microsystems.

[Introduction]: "The extraordinary growth of the World Wide Web has been fueled by the ability it gives authors to easily and cheaply distribute electronic documents to an international audience. As Web documents have become larger and more complex, however, Web content providers have begun to experience the limitations of a medium that does not provide the extensibility, structure, and data checking needed for large-scale commercial publishing. The ability of Java applets to embed powerful data manipulation capabilities in Web clients makes even clearer the limitations of current methods for the transmittal of document data."

"To address the requirements of commercial Web publishing and enable the further expansion of Web technology into new domains of distributed document processing, the World Wide Web Consortium has developed an Extensible Markup Language (XML) for applications that require functionality beyond the current Hypertext Markup Language (HTML). This paper describes the XML effort and discusses new kinds of Java-based Web applications made possible by XML."

A version of this document is available online in HTML and several other formats: HTML, or [directory] http://sunsite.unc.edu/pub/sun-info/standards/xml/why/.



[CR: 19980306]

Bosak, Jon. XML, Java, and the Future of the Web. Technical Paper. Mountain View, California: Sun Microsystems, November 1996. Extent: approximately 12 pages; 6 references. Author's affiliation: Online Information Technology Architect, SunSoft, 2550 Garcia Ave., MPK17-101, Mountain View, CA 94043, USA. Email: bosak@atlantic-83.Eng.Sun.COM.

"The extraordinary growth of the World Wide Web has been fueled by the ability it gives authors to easily and cheaply distribute electronic documents to an international audience. As Web documents have become larger and more complex, however, Web content providers have begun to experience the limitations of a medium that does not provide the extensibility, structure, and data checking needed for large-scale commercial publishing. The ability of Java applets to embed powerful data manipulation capabilities in Web clients makes even clearer the limitations of current methods for the transmittal of document data."

"To address the requirements of commercial Web publishing and enable the further expansion of Web technology into new domains of distributed document processing, the World Wide Web Consortium has developed an Extensible Markup Language (XML) for applications that require functionality beyond the current Hypertext Markup Language (HTML). This paper describes the XML effort and discusses new kinds of Java-based Web applications made possible by XML." [from the Introduction]

Available online: "XML, Java, and the future of the Web" in HTML format; [mirror copy]. A Postcript version is also available. Also: Murata Makoto of Fuji Xerox Information Systems prepared a Japanese translation of the paper.

Note: "The paper was written in HTML 3.2 and formatted by the Jade DSSSL engine for printout. The section numbers, headers, footers, and Table of Contents seen in the printed version are not part of the HTML source but were generated automatically as specified by a DSSSL stylesheet."



[CR: 19971202]

Bosak, Jon. "XML Ubiquity and the Scholarly Community [Closing Keynote Address]." Pages [?] in TEI 10: A Conference in Celebration of the Tenth Anniversary of the Text Encoding Initiative. Abstracts.. TEI 10: Text Encoding Initiative, Tenth Anniversary User Conference , Brown University, Providence, Rhode Island. November 14-16, 1997. Sponsored by Martin Hensel Corporation, Kluwer Academic Publishers, and MIT Press. Hosted by Brown University Library, and Computing and Information Services. Providence, RI: Brown University, 1977. Author's affiliation: Sun Microsystems, Inc.; Chair, W3C XML Work Group.

See the main database entry for additional information about the conference, or the Brown University web site.



[CR: 19980416]

Boualem, Malek; Harié, Stéphane. "MtScript: A Multilingual Text Editor ." Computers and the Humanities (CHUM) 31/ (1997) 135-151 (with 18 references). ISSN: 0010-4817. Authors' affiliation: [Boualem:] Université de Provence, CNRS URA 261, Laboratoire Parole et Langage; 29, Avenue Robert Schuman, 13621 Aix-en-Provence, France; Tel: (33) 04 42 95 36 27 - Fax: (33) 04 42 59 50 96 - Mobile: (33) 06 11 57 86 63; Email: malek@lpl.univ-aix.fr and mtscript@lpl.univ-aix.fr; WWW: Home Page' [Harié:} CNRS.

This paper describes the multilingual text editor MtScript, developed in the framework of the MULTEXT project. MtScript enables the use of many different writing systems in the same document (Latin, Arabic, Cyrillic, Greek, Hebrew, Chinese, Japanese, Korean, etc.). Future versions will use SGML and TEI norms, which offer ways of encoding multilingual texts and are to a large extent meant for interchange."

"MtScript provides typical editing functions such as insertion and deletion, even for text containing portions of writing in opposite directions. In addition, MtScript allows the user to explicitly associate portions of the text with a particular language, and to associate keyboarding rules with any language. Different types of character sets (single byte, multiple-byte) can also be handled, including the co-mingling of one-byte and multiple-byte character sets (ISO 8859 series, GB_2312_80, BIG_5, JISX0208, KSC5601). MtScript has been developed on Sun Workstations using X-Windows, Unix, C, and Tcl/Tk. A compiled alpha version (v1.1) is available for Sun Sparc stations under Solaris 1.x or 2.x, and for Linux (Intel)."

A project description for MtScript is available online.



[CR: 19981007]

Boyce, Peter. It's Not Your Father's Journal. Links, Permanence and Process: Three Secrets to Electronic Publishing. Presented at the Symposium "Beyond Print: Scholarly Publishing and Communication in the Electronic Environent." September 26-27, 1997. University of Toronto at Scarborough. Washington, D.C: American Astronomical Society, 1997. Author's affiliation: American Astronomical Society; Email: mailto:pboyce@aas.org; WWW: http://www.aas.org/~pboyce.

Abstract: "Links are important. No other scientific field has linked its electronic information as closely as has Astronomy. Electronic journals are part of this distributed, linked resource. Permanence is important. The American Astronomical Society (AAS) journals use a new publishing process which ensures the ability to maintain permanent access to electronic astronomical journals. Process is important. With the right process, we have been able to add links automatically to our journal; links to references, links to citations (where the electronic material exists) and internal links for ease of navigation within the journal. Readers like this. Links are important."

As for permanence: "How do we do this? The answer lies in preparing the original material in an open, standard format which incorporates logical markup. At present publishers have generally agreed to use the Standard Generalized Markup Language (SGML). The ApJ archival database, which the public never sees, is composed of manuscripts coded in SGML. From this database, we can automatically derive both HTML screen versions and the PDF version. Updating the public versions now becomes an automatic, almost trivial, process. . ."

The document is available online. [local archive copy]

See more in the database entry "American Astronomical Society."



[CR: 19960203]

Boyd, Barbara. "DEC's Technical Authoring Environment." <TAG>: The SGML Newsletter 10 (July 1989) 5-7. ISSN: 1067-9197. Author's affiliation: Digital Equipment Corporation.

The author summarizes key features of DEC's technical writing environment, which uses the X windowing system and DECwindows. The MIL-STD-1840A standard has been adopted for technical documentation, and it, along with other SGML-commensurable standards, forms the basis for successful interchange of the digital information at DEC sites.



[CR: 19961226]

Bradley, Neil. "Anatomy of an SGML Document." Pages 81-86 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: Pindar Plc Solutions Team, Ryedale Building, 60 Piccadilly, York, Y01 1NX, England; Tel: 00 44 (0)1904 613040; FAX: 00 44 (0)1904 613110; Email: n.bradley@pindar.co.uk; WWW: http://www.pindar.co.uk; WWW: http://www.bradley.co.uk.

Abstract: "There are three major components to an SGML Document - the SGML Declaration, Prolog and Document Instance. An understanding of their roles, their inter-dependencies, and their arrangement within a practical working environment is essential for all users of SGML based systems. As well as describing the purpose and content of each major component of an SGML Document, this paper explains how they are managed by an entity manager, and how they integrate with a parser."

Note: The above presentation was part of the "SGML Newcomer" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19981015]

Bradley, Neil. The Concise SGML Companion. Harlow, Essex: Addison-Wesley Longman, [December] 1996. Extent: 336 pagesISBN: 0-201-41999-8 [$29.95 US, $41.00 Canada]. Author's affiliation: Consultant, Pindar. Email: neil@bradley.co.uk.

"About the book: The Concise SGML Companion is a compact book aimed at 'hands-on' users of SGML. It explains all the important features of the language and includes several currently popular applications of its use, such as CALS Tables and HTML. It does NOT cover obsolete and little used features such as Link, Rank, Datatag and the system declaration, and it does not theorise at length about the benefits of using SGML. The book is split into three sections: the main body, which describes the standard and various uses, the Road Map, which contains charts that illustrate the standard, and the Glossary, which contains almost 1,000 entries, making it a mini-dictionary of SGML and related terms. . . In the future [the corresponding Web page] will support the book by supplying updates, corrections and additional material - in some cases derived from reader's feedback."

The book's features include: "(1) a detailed study of the most important and most widely-used SGML features; (2) an exposition of several popular SGML applications, including CALS tables and HTML 3.2, the latest version of the World Wide Web publishing language; (3) a series of inter-linked 'Road Map' charts that reveal the structure of the SGML standard. This section alone will guarantee the book remains open at your elbow at all times; (4) an extensive glossary of SGML and related terminology." [from the AW online description]

An online description of the book's contents, together with links to SGML resources on other sites: http://www.bradley.co.uk/ [mirror copy, September 11, 1996]. See the description on the Addison-Wesley Longman, Developers Press server, [mirror copy, or CALS Tables section, local archive copy]. Contact: Ellen Wohl, Addison-Wesley Longman, Developers Press, ellenwo@aw.com or Fax: +1 617-944-8243. See also the reference to the book on a Pindar page, including the online SGML glossary/dictionary.

See also Neil Bradley, The XML Companion. Harlow, Essex: Addison Wesley Longman, 1998. Extent: 464 pages. ISBN: 0-201-41999-8.



[CR: 19951208]

Bradley, Neil. "SGML Concepts." ASLIB Proceedings 44/7-8 (1992) 271-274.



[CR: 19980905]

Bradley, Neil. The XML Companion. Harlow, Essex: Addison Wesley Longman, 1998. Extent: 465 pages. ISBN: 0-201-41999-8. Author's affiliation: [SGML Consultant].

The XML Companion provides all users and potential users of XML with an accessible, in-depth reference to the release 1.0 of the standard. The book provides an accessible, comprehensive description of each XML feature; fully covers the proposed Linking and Stylesheet standards; does not assume experience of either HTML or SGML; features a series of 'Road Map' charts that reveal the structure of the XML standard; contains an extensive glossary of XML and related technology; includes coverage of DTD design, DOM, SAX, Architectural Forms, name spaces, whitespace rules, HTML (3.2 & 4), CSS (1 & 2), XSL, SGML differences."

See the accompanying Web site for the book: http://www.bradley.co.uk/.

Review by Wendell Piez.



Brandel, William. "SGML Technology Gains Popularity with Vendors." LAN Times 11/8 (April 25, 1994) 53-54.

"Abstract: Standard Generalized Markup Language (SGML) is quickly becoming the document-formatting technology of choice in the software industry, and may soon replace ASCII as the standard file format for documents due to its ability to handle compound documents containing graphics, audio and video as well as text. Already such major vendors as Novell Inc. are converting their documentation and programming references to SGML and putting them online in this format. Conversion to SGML allows vendors to reuse the same information in user manuals, training materials and technical references. They need only update a piece of information once after conversion, and SGML will take care of updating all documents that contain it. Vendors also find SGML's non-proprietary nature attractive. Lotus Development Corporation, WordPerfect Corporation and Microsoft Corporation are all adding SGML features to the next generation of their word-processing programs."



[CR: 19970808]

Brandhorst, Hans; van Huisstede, Peter; Loeffen, Arjan; Wiering, Frans. Toward a Uniform Registration of Textual Sources in the Humanities. Two Dutch Encoding Projects Using the TEI Guidelines.. Department Computer & Letteren Technical Report. Utrecht, The Netherlands: Department of Computer & Letteren, Utrecht University, [July] 1997. Extent: approximately 30 pages. Authors' affiliation: Department of Computer & Arts, Utrecht University.

[Summary] "In order to make textual sources in the humanities accessible for structure-and text-oriented research software, and to record such information in a stable, exchangeable form, the choice of the Standard Generalized Markup Language . . . As to the format, we have chosen for SGML. SGML itself is explained elsewhere in this article in more detail, so we will limit ourselves here to an account of the reasoning behind the choice. SGML encoded electronic texts are independent of existing hard- and software. This means that the exchange of electronic documents between computer platforms is possible without loss of information. To take a very simple example. Within a SGML encoded document or within the DTD associated with that document one formally declares what conventions were used to encode the Greek characters, or the mathematical characters, or the e with an acute accent (&eacute;), in short all those characters that are implemented differently on different platforms, or in different software programs. SGML handles these characters, but also references to external graphics, etc. in a uniform and standardized way. . . SGML produces device independent documents that explicitly record textual information, without depending on a particular presentational form. SGML compliant tools currently available cover all aspects of regular document production, including editors, browsers, converters, and document management systems."

The article was submitted to VGI cahier, Fall 1997.

Available online in HTML format: http://CandL.let.ruu.nl/preprint/vgi/vgi.htm; [archive copy, text only]



[CR: 19980414]

Bray, Tim. "Authoring in Crisis - Where Next?" The Gilbane Report on Open Information & Document Systems 5/6 (November/December 1997) 1-16. ISSN: 1067-8719. Author's affiliation: Textuality, and CAPV Ventures, The Gilbane Report [editor].

[Executive Summary]: Authoring, the process of creating documents, is a strategically important application, and one of the primary drivers of the population of every desktop with a computer. Much of the history of authoring has followed the WYSIWYG dream of emulating a piece of paper on the screen of your computer. But the advent of the Web technologies has brought a crisis to the authoring domain. Repurposing -- the ability to deliver the same document in multiple media, for example on paper and on screen -- has become a basic requirement, and one that is not well addressed by the current repertoire of tools. Some of the capabilities and characteristics of next-generation authoring systems are becoming apparent. This sector bids fair to be a major source of end-user satisfaction in the short-term, and thus, a major source of vendor opportunities shortly thereafter."

The major section headings in the article: "History: The Power To Be Your Best;   Why We Author;   WP and DTP and All That;   The Print Shop;   Then Came WYSIWYG;   Then the Internet Happened;   The Agony of the WP/DTP Desktop;   The Web Authoring Systems to the Rescue?;   The Dilemma Summarized;   Modularity, Reusability, SGML, and XML;   What's the Solution?;   WYSIWYG is Dead;   Repurposing;   Reuse and Modularity;   Stylesheets;   Retrieval and Metadata;   Internationalization;   Hypertext;   How Do We Build the New Authoring Systems?;   Risks and Costs;   Conclusion."



[CR: 19970909]

Bray, Tim. "The Browser Platform: Its Problems and its Future." The Gilbane Report on Open Information & Document Systems 5/3 (May - June 1997) 1-19. ISSN: 1067-8719. Author's affiliation: Textuality, and CAPV Ventures, The Gilbane Report [editor].

Abstract: "Web browsers have redefined the whole idea of a computing 'platform'. Simple, universal access to applications and information is now conceivable, albeit not always achievable in ways that we would like. There are some problems. The same simplicity that makes browsers so useful can get in the way when solutions require a bit more sophistication. Complaints about the quality of presentation and the difficulty of programmability are common. Some vendors have used these problems to delay the process of rebuilding and repositioning their product lines to support intranet infrastructures. These problems, real as they are, are going to get fixed pretty quickly. In spite of limitations, vendors have no choice but to retrofit browser clients and technology into their "client/server" applications (We use the term in its popular sense. In our view, client/server has never been more than a pricing model." [From http://www.capv.com/dss/gilbane/report.html, August 30, 1997]

The article discusses CSS and DSSSL stylesheets (9), as well as the role of XML in the future of Internet Web browsers (15-16).



[CR: 19980415]

Bray, Tim. "Document Computing - Is This Our Business?" The Gilbane Report on Open Information & Document Systems 6/1 (January/February 1998) 1-16. ISSN: 1067-8719. Author's affiliation: Textuality, and CAPV Ventures, The Gilbane Report [editor].

What is a "document" in the context of a networked world where pieces of information resident on different computers are combined "just in time" and "on the fly" to generate hignly interactive presentations of text and graphical content? What is a micro-document? What is document processing? In this feature article, Tim Bray interacts with the term "document computing" (encompassing "at least electronic publishing, word processing, document management, and information retrieval") as a means of getting to tbe bottom of these and related questions. Particularly "with the advent of XML," he says, "there are going to be a large number of electronic objects hurtling around the Net that are called documents but certainly don't 'feel like' documents."

The major section headings in the article: "So What's the Problem?;   Where We're Coming From;   At the User Interface;   What Makes a Document Anyhow?;   Documents and Particles and Waves;   Text and Language;   The Nature of Text;   Searching;   The Issue of Sequence;   Hierarchy;   Hypermedia;   Versioning;   Creation;   Maintenance;   Delivery;   Knowledge;   Summing Up."

Editorial introduction: "This is the fifth anniversary issue of The Gilbane Report. A lot has changed of course, but it is fascinating to look back and see how many of the topics we covered in those early issues are just as relevant, and just as unresolved. In this issue Tim [Bray] takes a fresh look at some of the same questions we looked at in Volume 1, Number 1. We think it is important to step back and look at all the technologies and practices that deal with documents and the information they contain and look for common threads and the connections with other information technologies. Why? Because it is always better (at least in business) to know your destination and what it looks like, and we need a vision to facilitate communication and progress. We think the term 'document computing' captures most, if not the essence, of what we do and where we are headed as an industry. In fact, The Gilbane Report on Open Information & Document Systems was almost called The Gilbane Report on Document Computing. We were dissuaded from the latter by the number of puzzled looks we got from colleagues we checked with. Even those who, whole-heartedly agreed with the concept felt the world was not ready to subscribe to a journal covering such an unknown, even esoteric, topic. In 1993 it was difficult enough to convince the mainstream business world that managing bit-mapped images of pages was not all they needed to manage information in documents. It was also true then (as it still is) that the word 'document' carries too much of the wrong kind of paper-and-ink baggage for some people. We have to get beyond the notion that information technology and document technology are limited to a linear hand-off relationship. And we need a vocabulary that doesn't encourage such a backwards bifurcation. We think 'document computing' helps. Tell us what you think."



[CR: 19971211]

Bray, Tim. "An Introduction to XML Processing with Lark." Pages 177-186 in XML: Principles, Tools, and Techniques. Guest Edited by Dan Connolly. World Wide Web Journal [edited by Rohit Khare] Volume 2, Issue 4. Sebastopol, CA: O'Reilly & Associates, Fall 1997. Extent: xxii + 248 pages. ISBN: 1-56592-349-9. ISSN: 1085-2301. Author's affiliation: Textuality.

Abstract: "Lark is a non-validating XML processor implemented in the Java language; it attempts to achieve good trade-offs among compactness, completeness, and performance. This report gives an overview of the motivations for, facilities offered by, and usage of, the Lark processor. This article applies to version 0.92 of Lark, in use in early September 1997."

A later version of this document is available online in HTML format: http://www.textuality.com/Lark/.



[CR: 19971211]

Bray, Tim. " Metadata. What is is and Why We Need it on the Web." The Gilbane Report on Open Information & Document Systems 5/5 (September/October 1997) 1-15. ISSN: 1067-8719. Author's affiliation: Textuality, and CAPV Ventures, The Gilbane Report [editor].

Abstract: "Metadata, 'data about data', is not at all new. It is in fact, as any library scientist knows, a lot older than the computing technology we all use. Metadata is something many of you have heard of, but few of you have spent a lot of time agonizing over. If you have worked with metadata it has most likely been during the design of a database application, or with 'data about documents' when implementing a document management system. Metadata is what document management systems do - you need information about documents in order to know how to manage them. Most content (data or documents) on the Web is still not managed. This is not surprising when you think of the Internet, but is a very unfortunate situation on corporate Intranets. The main reason for this unhappy state of affairs is that there is no easy or common way to deal with metadata on the Web. The only real option you have today for serious metadata management is to tie in your Web documents to a document management system. This is not something everyone is likely to do. Fortunately, there is a lot of effort directed at plugging this hole. The W3C has collected proposals from people like Microsoft and Netscape, as well as ideas from the SGML and library science communities. They have an initiative called RDF (Resource Description Framework) to pull all these together into a common approach. Yes, this is yet another must-pay-attention-to area. In this issue, Tim explains how metadata has been used historically, why we need it, and brings you up to date on the challenges that remain." [From http://www.capv.com/dss/gilbane/report.html, December 11, 1997.]

For a collection of pointers to "metadata" Web sites, see the dedicated document.



[CR: 19970726]

Bray, Tim. "XML: Moving Toward Richer, Smarter Web Pages [Intranet Handbook.]." Network World 14/25 (June 23 1997) 10-12 [Supplement Section]. ISSN: 0887-7661. Author's affiliation: Textuality.

"Abstract: The Extensible Markup Language (XML) is less well-known than HTML. The year-old language is the result of the SGML activity, and is a creation of the World Wide Web Consortium. The first companies that have expressed an interest in XML are Microsoft and Netscape Communications. XML is a subset of SGML, and it provides a framework for making intranet applications run faster. XML provides two functions that are currently missing in Web technology. XML makes intranet data easier to manage because it is self-describing. It allows for the development of faster applications by transferring processing from the server to the desktop PC. XML does not use predefined tags, but instead allows users to define their own. This makes the intranet smarter and more customizable. Unlike SGML, XML is easy to understand, and it is easy to write programs that can read and extract data from an XML file." See: http://www.nwfusion.com/.



[CR: 19970916]

Bray, Tim. "XML Leaders Push Forward at Montreal Meeting. No Earth-shattering Surprises, but Solid Progress." Seybold Report on Internet Publishing 2/1 (September 1997) 3-4. ISSN: 1090-4808. Author's affiliation: Textuality.

The author summarizes the high points of the "XML Developers' Day", held on August 21, 1997, in Montreal, Canada. It was a meeting of approximately seventy-five (75) developers who came together after the 1997 International Conference on the Application of HyTime. The article discusses in particular: (1)Bitstream's NuDoc formatting facility, which now handles XML; (2) XML support by CommerceNet (electronic commerce consortium); and (3) progress in the development of authoring tools for XML (ArborText, Grif). An acronym list ("XML as an Acronym Factory") is provided as a sidebar. See the main XML entry for additional information on the Extensible Markup Language.



[CR: 19970909]

Bray, Tim. "Where We Stand in Early '97: The View from Documation." The Gilbane Report on Open Information & Document Systems 5/2 (March - April 1997) 1-16. ISSN: 1067-8719. Author's affiliation: Textuality, and CAPV Ventures, The Gilbane Report [editor].

The author reports on the Documation '97 Conference in Santa Clara, California (February 26 - 28, 1997). The report includes a section on SGML/XML and on (SGML) databases (pages 12-14). "One trend is...the advent of document management systems that try to take advantage of SGML's support for hierarchies and reusable modular components. Obviously, there are problems getting this to work with conventional relational databases. Every one of these vendors is using some flavor of object database..."



[CR: 19960811]

Bray, Tim; Blake, G. Elizabeth; Tompa, Frank Wm. "Shortening the OED: Experience with a Grammar Defined Database." ACM Transactions on Office Information Systems 10/3 (July 1992) 213-232. Author's affiliation: Open Text Corporation; [in 1996: Textuality].

For other publications on SGML-related research on lexical databases, see the entry for the University of Waterloo Centre for the New OED and Text Research.



[CR: 19971106]

Bray, Tim; DeRose, Steve. "Extensible Markup Language (XML) Part 2: Linking." Pages 67-82 in XML: Principles, Tools, and Techniques. Guest Edited by Dan Connolly. World Wide Web Journal [edited by Rohit Khare] Volume 2, Issue 4. Sebastopol, CA: O'Reilly & Associates, Fall 1997. Extent: xxii + 248 pages. ISBN: 1-56592-349-9. ISSN: 1085-2301. Authors' affiliation: [Bray]: Textuality; [DeRose]: Inso.

Abstract: "This document specifies a simple set of constructs that may be inserted into XML documents to describe links between objects and to support addressing into the internal structures of XML documents. It is a goal to use the power of XML to create a structure that can describe the simple unidirectional hyperlinks of today's HTML as well as more sophisticated multi-ended, typed, self-describing links."

A version of this document is available online in HTML format: http://www.w3.org/TR/WD-xml-link-970731; [local archive copy].



[CR: 19971106]

Bray, Tim; Paoli, Jean; Sperberg-McQueen, C. M. "Extensible Markup Language (XML)." Pages 67-82 in XML: Principles, Tools, and Techniques. Guest Edited by Dan Connolly. World Wide Web Journal [edited by Rohit Khare] Volume 2, Issue 4. Sebastopol, CA: O'Reilly & Associates, Fall 1997. Extent: xxii + 248 pages. ISBN: 1-56592-349-9. ISSN: 1085-2301. Authors' affiliation: [Bray]: Textuality; [Paoli]: Microsoft; [Sperberg-McQueen]: University of Illinois at Chicago.

Abstract: "Extensible Markup Language (XML) is an extremely simple dialect of SGML which is completely described in this document. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML."

A version of this document is available online in HTML format: http://www.w3.org/TR/WD-xml-970807; [local archive copy ].



[CR: 19961226]

Bray, Tim; Sperberg-McQueen, C. Michael. "Extensible Markup Language." Pages 399-404 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Authors' affiliation: [Bray]: Textuality; [Sperberg-McQueen]: University of Illinois at Chicago.

Abstract: "Extensible Markup Language (XML for short) is being designed under the auspices of the World Wide Web Consortium; the larger goal of this effort is 'to enable future Web user agents to receive and process generic SGML in the way that they are now able to receive and process HTML. As in the case of HTML, the implementation of SGML on the Web will require attention not just to structure and content (the domain of SGML per se) but also to link semantics and display semantics.' [from the W3C 'Activity' Page] As a subgoal, we are creating an SGML application profile, XML, that is designed to provide many of the benefits of SGML in a lightweight, easy-to-use, easy-to-implement dialect that omits many of the difficult or problematic features of the full standard. This paper is an interim report on the progress of the work on creating an XML specification. This work is proceeding rapidly and we anticipate a draft of the specification being available at the time of SGML '96."

Further information on XML is available in the main XML entry of the SGML/XML Web Page.

Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19980719]

Brennan, Elaine M. "On Weighty Tomes." <TAG>: The SGML Newsletter 11/7 (July 1998) 10-12. ISSN: 1067-9197. Author's affiliation: Senior Consultant, Information Architects, Englewood, CO.

Summary: Brennan reviews three books on XML, all of which [she says] to some extent exhibit the "early adopter syndrome." Viz., some details that are irrelevant or invalid in light of XML 1.0 as published in the February 1998 specification. The books reviewed are: 1) Michael Leventhal, David Lewis, and Matthew Fuchs; with contributions from Stuart Culshaw and Gene Kan, Designing XML Internet Applications; 2) Simon St. Laurent, XML: A Primer; 3) Steven Holzner, XML Complete.



[CR: 19970401]

Brewin, Paul. "SGML and Patent Document Processing. WIPO standard ST.32." World Patent Information 18/4 (December 1996) 183-192 (with 20 references). Author's affiliation: Information Systems, European Patent Office, Rijswijk, Netherlands.

"Abstract: A description of SGML (Standard Generalized Markup Language) is given, together with a detailed description of WIPO Standard ST.32. The benefits of the use of SGML are highlighted-its system independence and flexibility in building publication systems and full text databases. The use of SGML for patent document processing and how it might be beneficial for patent departments and representatives to use SGML in their own document systems, as well as for electronic filing of applications, is discussed. Reference is made to its use in the European Patent Office."

See also: "Nightmare on SGML street: marking up patents," presented by Paul Brewin at SGML Europe '96. For other papers on the conversion of EPO documents into SGML, see: (1) Susanne Richter-Wills, "High-volume, High-accuracy, SGML Document Capture: A Case Study"; and (2) Hugh R Stabler, "Experiences with High-Volume, High-Accuracy Document Capture."



Brooks, Kenneth P "Lilac: A Two-View Document Editor." IEEE Computer 24/6 (June 1991) 7-19. ISSN: 0018-9162. Author affiliation: Digital Equipment Corporation.

Published summary: "By offering both WYSIWYG editing and language-based document description side by side, the Lilac document preparation system gives users the best of both worlds." For details, see the author's dissertation.

Abstract: A description is given of Lilac, an experimental document preparation system designed to provide the best of both the WYSIWYG (what you see is what you get) and the document compiler approaches. Lilac does this by offering both WYSIWYG editing and language-based document description as two views side by side on the screen. The page view is a WYSIWYG editor showing a close approximation to the printed output. The source view shows a program-like description of the document in a special-purpose language. This language supports subroutines, variables, and conditional execution, and is designed to encourage the use of subroutines to embody structure. Both views are editable, but Lilac is designed with the expectation that most editing will occur in the page view.



Brooks, Kenneth P. A Two-view Document Editor with User-definable Document Structure. Systems Research Center Technical Report, 33. Palo Alto, CA: Digital Equipment Corporation, 1 November 1988. vi + 193 pages.

The report, with minor changes, represents the author's PhD dissertation presented to the Department of Computer Science, Stanford University, May 1988. Abstract: "Lilac is an experimental document preparation system which combines the best features of batch-style document formatters and WYSIWYG editors. To do this, it offers the user two views of the document: a WYSIWYG view and a formatter-like source view. Changes in either view are rapidly propogated to the other. This report describes both the user interface and design and the implementation mechanisms used to build Lilac." See also the description of Lilac in IEEE Computer.



[CR: 19961018]

Brown, Allen; Brüggemann-Klein; Feng, An (volume editors). EP '96. Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation and Typography. [ = Journal Special Issue: Electronic Publishing - Origination, Dissemination and Design (EPODD), June & September 1995, Volume 8, Issues 2-3. Sixth International Conference on Electronic Publishing, Document Manipulation and Typography, Palo Alto, California. September 24-26, 1996. Sponsored by Adobe Systems Incorporated; School of Information Management and Systems, University of California at Berkeley; Xerox Corporation. [Journal] Editors David F. Brailsford and Richard K. Furuta. Chichester / New York: John Wiley & Sons, 1996. ISSN: 0894-3982.

The proceedings volume contains 240 pages, indexes, and 16 major articles; many of the articles treat SGML and structured documents. It also has an introductory article by the volume editors (Brown, Brüggemann-Klein, Feng) describing the EP '96 in its historical context. Some of the more relevant feature articles include the following: Eila Kuikka and Airi Salminen, "Filtering Structured Documents in the SYNDOC Environment" [compare: Eila Kuikka, Jouni Mykkänen, Arto Ryynänen, and Airi Salminen, "Implementation of Two-dimensional Filters for Structured Documents in SYNDOC Environment"]; Jacco van Ossenbruggen, Anton Eliëns, and BastiaanSchönhage, "Web Applications and SGML"; Patricia François, Philippe Futtersack, and Christophe Espert, "SGML/HyTime Repositories and Object Paradigms"; Philip N. Smith and David F. Brailsford, "Towards Structured, Block-Based PDF"; Ethan V. Munson, "A New Presentation Language for Structured Documents"; Xinxin Wang and Derick Wood, "XTABLE - A Tabular Editor and Formatter"; Helena Ahonen, "Automatic Generation of SGML Content Models"; Hélène Richy and Jacques André, "Typographic Sheets and Structured Documents"; P. R. King, "Modelling Multimedia Documents"; Anne Brüggemann-Klein, Rolf Klein, and Stefan Wohlfeil, "Pagination Reconsidered"; William S. Lovegrove and David F. Brailsford, "Document Analysis of PDF Files: Methods, Results and Implications"; Vijay Kumar, Richard Furuta, and Robert B. Allen, "Interactive Interfaces for Knowledge-Rich Domains".

For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986.



Brown, Allen L., Jr.; Wakayama, Toshiro; Blair, Howard A. "A Reconstruction of Context-Dependent Document Processing in SGML." Pages 1-25 (with 10 references) in EP [Electronic Publishing] 92: Proceedings of Electronic Publishing, 1992 International Conference on Electronic Publishing, Document Manipulation, and Typography. Swiss Federal Institute of Technology, Lausanne, Switzerland. April 7-10, 1992. Sponsored by the Swiss Federal Institute of Technology and the Swiss National Science Foundation. Edited by Christine Vanoirbeek and Giovanni Coray [EPF, Lausanne, Switzerland]. The Cambridge Series on Electronic Publishing. Cambridge: Cambridge University Press, 1992. ISBN: 0-521-43277-4. Authors affiliation: Xerox Corporation; Syracuse University.

Abstract: SGML achieves a certain degree of context-dependent document processing through attributes and linking. These mechanisms are insufficient in several respects. To address these shortcomings we propose augmenting SGML's !LINK and !ATTLIST constructs with two new mechanisms, coordination and (rule-based) attribution. These mechanisms can be used to specify the result of context-dependent processing in a uniform fashion while considerably increasing SGML's expressive power. We illustrate this enhanced power by sketching a specification of (the result of) document layout that can be encoded in SGML augmented with coordination and attribution.



[CR: 19970331]

Brown, Betsy; Collier, Karen; Farr, Chuck; Littrell, Betty; Stagle, Sharon; Stratton, Deborah. "From Hardcopy to Online: Changes to the Editor's Role and Processes." Pages 131-138 in Conference Proceedings, SIGDOC '96. The 14th Annual International Conference on Computer Documentation. ["Marshalling New Technological Forces: Building a Corporate, Academic, and User-Oriented Triangle"]. ISGDOC '96: 14th Annual International Conference. Research Triangle Park, North Carolina, US. October 20-23, 1996. Sponsored by the Association for Computing Machinery Special Interest Group on Documentation (SIGDOC). New York, NY: Association for Computing Machinery, 1996. ISBN: 0-89-791-799-5. Authors' affiliation: Tandem Computers Incorporated.

Summary: The paper describes the editing group's experiences in migrating from production processes aimed at print manuals to a production system for online documentation. The test case included production of a CDROM with thirty-five technical manuals. The article describes the use of SGML encoding with the DynaText viewer from EBT (Inso), the use of Framemaker, and conversion pathways.

Several other articles in this proceedings volume are germane to SGML: Tom Banfalvi, et al., "Manufacturing Documentation in the Virtual Warehouse"; Paul Beam and Peter Goldsworthy, "Technical Writing on the Web-Distributed SGML-Based Learning"; Stephanie Copp, "Working with Academe"; Cindy Roposh, et al., "Developing Single-Source Documentation for Multiple Formats"; Paul Prescod, "Multiple Media Publishing in SGML"; Lin-Ju Yeh, et al., "SSQL: a Semi-Structured Query Language for SGML Document Retrievals"; Dee Stribling, et al., "A Real World Conversion to SGML".



[CR: 19971123]

Brown, Bruce Eric; McNeill, James W. "Bottoms-Up, A Paradigm Shift." Page(s) 141-144 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Authors' affiliation: Datalogics Incorporated.

Abstract: "A new data modeling approach to producing SGML documents has been developed. Documents are assembled from content models, or information units, which are created and edited using common tools. These information units are collections of SGML elements, raw text, and processes, but less than whole documents. For this work, when an assembly of these objects or information units is made, then the DTD and FOSI are created for use with the output document. If the information objects conform to a given DTD (say the ATM 2100), then the assembled document will also conform. We start by describing some of the real issues that SGML systems face, then some of the approaches others have taken. Finally we detail our solution and the research that is ongoing.

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.



[CR: 19951113]

Brown, Heather; Cole, Fred; Oxborrow, Elizabeth. "An Object-Oriented Toolkit for Structured Documents." Pages 95-111 (with 18 references) in EP [Electronic Publishing] 92: Proceedings of Electronic Publishing, 1992 International Conference on Electronic Publishing, Document Manipulation, and Typography. Swiss Federal Institute of Technology, Lausanne, Switzerland. April 7-10, 1992. Sponsored by the Swiss Federal Institute of Technology and the Swiss National Science Foundation. Edited by Christine Vanoirbeek and Giovanni Coray [EPF, Lausanne, Switzerland]. The Cambridge Series on Electronic Publishing. Cambridge: Cambridge University Press, 1992. ISBN: 0-521-43277-4. Author affiliation: University of Kent, UK.

"Abstract: There is an increasing interest in both structured documents and in the use of objectbases for storing structured multimedia documents and hypertext. This paper describes a simple objectbase and object-oriented toolkit designed primarily to support structured ODA documents, and evaluates the strengths and weaknesses of an object-oriented approach towards toolkits for structured multimedia documents and hypertext."



[CR: 19980907]

Brown, Heather; Harding, Robert; Lay, Steven; Robinson, Peter; Sheppard, Dan; Watts, Richard. "Active Alice: Using Real Paper to Interact with Electronic Text." Pages 407-419 (with 16 references) in Electronic Publishing, Artistic Imaging, and Digital Typography. Proceedings of the 7th International Conference on Electronic Publishing (EP '98), Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, RIDT '98). EP '98 and RIDT '98, Saint Malo, France. March 30 - April 3, 1998. Edited by Roger D. Hersch, Jacques André, and Heather Brown. Lecture Notes in Computer Science Series, Number 1375. New York/Berlin/Heidelberg: Springer-Verlag, 1998. ISBN: 3-540-64298-6, and 3-540-64298-6. Authors' affiliation: [Brown] Kent University, Canterbury, UK.

Abstract: "Many documents exist in both paper and electronic forms. Paper has many well known advantages, but electronic texts often contain useful information that is not easily accessible from printed paper versions. SGML texts, in particular, are rich sources of additional information. The Active Alice project shows how a reader can use a paper document to access information from its corresponding electronic version without having to manipulate the electronic version via a separate computer interface. The project makes use of a DigitalDesk. This is an ordinary desk augmented with a video camera and computer driven projector. The camera captures images of the pages on the desk and detects simple user actions such as pointing to specific words on a page. The images are used to associate the pages with their SGML counterparts. Information from the SGML versions can then be conveyed directly to the reader via information projected onto the page or onto other areas of the desk. The project takes its name from the example text used-a version of Alice's Adventures in Wonderland."

See the slides from the presentationonline abstract; also full text PDF, [local archive copy]. Slides also in Postscript.



[CR: 19961202]

Brown, Peter J.; Brown, Heather. "Embedded or Separate Hypertext Mark-up: Is It an Issue?" Electronic Publishing -- Origination, Dissemination and Design (EPODD) 8/1 (March 1995) 1-13 (with 9 references). ISSN: 0894-3982. Author's affiliation: Computing Laboratory, The University, Canterbury, Kent (Email: [Peter Brown] P.J.Brown@ukc.ac.uk.

"Abstract: Most hypertext systems used in the field embed some form of mark-up in each hyperdocument in order to represent the hypertext structure. Indeed, more generally, most document preparation systems use this approach, Hypertext researchers, on the other hand, say that the structure of a hyperdocument should be separate from its content. The paper investigates whether the two approaches, embedded versus separate, are really at odds with one another, and describes a technology for combining some of the benefits of both."

According to the authors, markup levels are categorized under these names: embedded positional markup, separate positional mark-up, (separate) calculated mark-up, generated mark-up, and dynamic generated mark-up. As to separate versus embedded mark-up (markup embedded within the document), the authors assert: "Mark-up of any document can either be embedded within the document or stored separately. In the hypertext field, all the expert researchers will say that separate mark-up is the only respectable approach -- the structure should be separate from the content, whereas virtually all the hypertext systems that are widely used in the field are based on embedded mark-up. This paper examines whether the issue of embedded versus separate mark-up really is an important one. . ."

The authors conclude: "The big issue is not the nature of the mark-up but the facilities for integrating software components. In any real-world situation the hypertext system needs to work with other systems to provide a solution to a problem. The closer these systems fit together the better the solution. Our Joiner/Splitter technology requires a relatively small degree of integration. The real challenge is to proceed to a degree of integration where we never need to distinguish between documents that are hyperdocuments and those that are not, but where components work together to make hypertext ubiquitous, with any underlying tools to change the formats of the information being invisible to users."

See also Heather Brown's home page, with links to researches and publications on ODA.



[CR: 19961019]

Brüggemann-Klein, Anne. Compiler-Construction Tools and Techniques for SGML Parsers: Difficulties and Solutions. Technical Report. Freiburg: Institut für Informatik, Universität Freiburg, May 29, 1994. Author's address: Anne Brueggemann-Klein, Institut für Informatik, Universität Freiburg, Rheinstrasse 10-12, 79104 Freiburg, Germany, email: brueggem@informatik.uni-freiburg.de.

Abstract: The Standard Generalized Markup Language (SGML) is used to represent documents in an application-independent manner. In a recent paper, Nordin et al. analyze concisely which properties of the SGML language are hindering its more widespread use and acceptance. In particular, they identify a number of features in the SGML standard that make it difficult to apply commonly used implementation tools and techniques to build an SGML parser. One feature, however, or rather one combination of two features, escapes their notice. Unambiguity and the & operator were both intended to make SGML document grammars easier to read by humans. It is questionable, though, whether this goal is really achieved. At least, the combination of unambiguity and the & operator raises unforeseen problems in validating the grammars and in parsing the documents by machines. I am describing these problems here in detail. On the basis of this analysis, the standards committees that are currently revising the standard can make an informed decision on the future of the two features.

To appear in Electronic Publishing: Origination, Dissemination and Design. For a Postscript version, use anonymous FTP (ftp.informatik.uni-freiburg.de/documents/papers/brueggem/standardEPODD.ps) from Freiburg; [mirror copy].



Brüggemann-Klein, Anne. Formal Models in Document Processing. Habilitationsschrift, vogelegt zur Erlangung der venia legendi für Informatik an der Mathematischen Fakultät der Albert-Ludwigs-Universität zu Freiburg i.Br. Freiburg, 1993. 110 pages, bibliography, index.

Summary: Part I of the dissertation (pages 1-64) treats the formal properties of documents, and particlarly, marked-up documents in terms of unambiguous expressions and unambiguous languages. Part II of the dissertation is concerned with design specifications for layout of a structured document as implemented in Designer: Chapter 4 discusses "The constituents of design specifications," and Chapter 5 discusses "The design specification language." The presentation focuses upon aspects of document, formatter, and style-sheet models that are applicable within the general context of logically marked-up documents. The approach to document design taken in Designer "emulates, in the context of electronic publishing, the separation of concerns between authoring, editing, designing, and typesetting that is well-established in the traditional publishing industry." (p. 4)

For a Postscript version, try FTP from Freiburg. Author's address: Anne Brueggemann-Klein, Institut für Informatik, Universitaet Freiburg, Rheinstrasse 10-12, 79104 Freiburg, Germany; Email: brueggem@informatik.uni-freiburg.de. [local archive copy]





Brüggemann-Klein, Anne. "Regular Expressions into Finite Automata." Pages 97-98 in Latin '92. Edited by I. Simon. Lecture Notes in Computer Science, 583. Berlin: Springer Verlag, 1992.

See full article in Theoretical Computer Science and the related technical report.



Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Bericht 33. Freiburg: Universität Freiburg, Institut für Informatik, Juli 1991. 22 pages.

Abstract: It is a well-established fact that each regular expression can be transformed into a non-deterministic automaton (NFA) with or without ε-transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi BS86 have shown that the construction of an ε-free NFA due to Glushkov Glu61 is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives Brz64 of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard ISO86, now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e., expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the expression, and that this is worst case optimal. For deterministic expressions, our algorithm has even linear run time. This improves on the cubic time methods suggested in the literature BEGO71ASU86BS86. A major step of the algorithm consists in bringing the expression into what we call star normal form. This concept is also useful for characterizing the relationship between two types of unambiguity that have been studied in the literature. Namely, we show that, modulo a technical condition, an expression is strongly unambiguous SS88 if and only if it is weakly unambiguous BEGO71 and in star normal form. This leads to our third result, a quadratic time decision algorithm for weak unambiguity, that improves on the bi-quadratic method introduced by Book et al. BEGO71. (A version of this TR is also to appear in the conference proceedings of Latin '92.)



[CR: 19961120]

Brüggemann-Klein, Anne. "Unambiguity of Extended Regular Expressions in SGML Document Grammars." Pages 73-84 (with [9] references) in Algorithms -- ESA '93: Proceedings of the First Annual European Symposium (Bad Honnef, Germany. September 30 - October 2, 1993). Edited by Th. Lengauer. [Series title:] Lecture notes in computer science, 726. Berlin /Heidelberg /New York: Springer Verlag, 1993. ISBN: 3-540-57273-2. ISSN: 0302-9743.



[CR: 19951113]

Brüggemann-Klein, Anne; Dolland, P.; Heinz, A. "How to Please Authors and Publishers: A Versatile Document Preparation System at Karlsruhe." Pages 9-31 in TeX for Scientific Documentation. Proceedings of the Second European Conference. (The Second European Conference on TeX for Scientific Documentation, Strasbourg, France, June 19-21, 1986. Sponsored by: CNRS (Centre National de le Recherche Scientifique), SMF (Société Mathématique de France), Université Louis-Pasteur de Strasbourg). Edited by Jacques Désarménien. Lecture Notes in Computer Science, Number 236. Berlin/New York: Springer-Verlag, 1986. ISBN: 0387168079 (New York); ISBN: 3540168079 (Berlin). Authors' affiliation: Institut für Angewandte Inf. und Formale Beschreibungsverfahren, Karlsruhe, Germany.

Abstract: "The article introduces a document preparation environment which supports authors in the production and publication of documents of high typographic quality. The document model is compatible with the SGML-model standardized by ISO, and formatting can be done with TEX."



[CR: 19961018]

Brüggemann-Klein, Anne; Klein, Rolf; Wohlfeil, Stefan. "Pagination Reconsidered." Pages 139-152 (with 10 references) in EP '96. Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation and Typography. [ = Journal Special Issue: Electronic Publishing - Origination, Dissemination and Design (EPODD), June & September 1995, Volume 8, Issues 2-3. Sixth International Conference on Electronic Publishing, Document Manipulation and Typography, Palo Alto, California. September 24-26, 1996. Sponsored by Adobe Systems Incorporated; School of Information Management and Systems, University of California at Berkeley; Xerox Corporation. [Proceedings Volume] Edited by Allen Brown, Anne Brüggemann-Klein, and An Feng; [Journal] Editors David F. Brailsford and Richard K. Furuta. Chichester/ New York: John Wiley & Sons, 1996. ISSN: 0894-3982. Author's affiliation: [Brüggemann-Klein]: Technische Universität München, Fachbereich Informatik, Arcisstrasse 21, 80290 München, Germany; Tel. +49.89.450552-30; Fax +49.89.450552-22. E-Mail brueggem@informatik.tu-muenchen.de; [Klein and Wohlfeil]: FernUniversität Hagen; Email: rolf.klein@fernuni-hagen.de, stefan.wohlfeil@fernuni-hagen.de..

Abstract: "We present a new algorithm for pagination that minimizes the number of page turns that are necessary while reading a formatted document. This approach keeps the total number of pages small and places figures close to their citations. Examples show that the resulting documents are superior in quality to what standard formatting systems achieve. Our algorithm is easy to implement, and runs in time proportional to the number of text objects times the number of floating objects."

The article addresses the problem of positioning "floating objects" in document layout: figures, tables, footnotes, (etc.), where proximity to (first) reference is a key concern in terms of readability.

For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.



[CR: 20000803]

Brüggemann-Klein, Anne; Wood, Derick. "Caterpillars: A Context Specification Technique." [ARTICLE] Markup Languages: Theory & Practice 2/1 (Winter 2000) 81-106 (with 36 references). ISSN: 1099-6622 [MIT Press]. Authors' affiliation: [Brüggemann-Klein:] Institut fü Informatik, München; Email: brueggem@informatik.tu-muenchen.de; [Wood:] Department of Computer Science, Hong Kong University; Email: dwood@cs.ust.hk.

Abstract: "We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of context-dependent style sheets for HTML, SGML and XML documents, it can also be used for query specification for structured documents, as we shall demonstrate, and for the specification of computer program transformations. From a conceptual point of view, structured documents are trees, and one of the oldest and best-established techniques to process trees and, hence, structured documents are tree automata. We present a number of theoretical results that allow us to compare the expressive power of tree automata. In particular, we demonstrate that each caterpillar expression describes a regular tree language that is, hence, recognizable by a tree automaton. Finally, we employ caterpillar expressions for tree pattern matching. We demonstrate that caterpillar automata are able to solve tree-pattern-matching problems for some, but not all, types of tree inclusion that Kilpeläinen investigated in his Ph.D. thesis. [Tree Matching Problems and Applications to Structured Text Databases.] In simulating tree pattern matching with caterpillar automata, we reprove some of Kilpeläinen's results in a uniform framework."

The contextual technique we introduce is also applicable to the compilation of computer programs, but has little appeal since compiler designers and writers do not usually allow users to modify a compiler according to new context dependencies. Our techniques may, however, be used when developing code optimizers or other program transformation tools since, in both cases, there may be a number of individuals collaborating on the development. Once we have isolated the specification of contexts from the more general specification of style sheets, we are able to provide naive users with better support for this aspect of style specification. Indeed, it also frees us to consider different techniques (different from attribution, for example) for context specification. Since regular expressions are understood by many people who are not programmers per se, and they are a simple specification technique, we decided to use them for context specification. There is a body of somewhat related work, which we discuss below ('Caterpillars and context'), in which a similar decision was made. We make the well-accepted assumption that a set of similar documents are modeled by syntax trees or abstract syntax trees of a given grammar (an SGML document grammar, an XML document grammar or HTML) that generates the set of all such documents. From now on we will no longer mention SGML and HTML but restrict ourselves to XML and XML document grammars. Indeed, for this paper it is irrelevant which specific grammar mechanism is used to define classes of documents. We introduce and motivate ('Caterpillars and context' and 'Evaluating caterpillar expressions') the notions of caterpillars and context and establish a basic complexity result for the evaluation of caterpillar automata on document trees. In another section ('Caterpillar-regular and regular tree languages'), we investigate the expressive power of caterpillar automata in comparison with tree automata. In particular, we demonstrate that each caterpillar expression describes a regular tree language that is, hence, recognizable by a tree automaton. Finally, in the following section ('Caterpillars and tree pattern matching'), we demonstrate that caterpillar automata can be used to solve tree-pattern-matching problems... We have mentioned our interest in document classes that are constrained by some type of grammars, for example by XML DTDs. The results in this paper hold for the classes of all documents that are given as trees over a fixed alphabet. Applications, particularly applications based on document databases, often come with non-trivial document grammars. It is therefore pertinent to generalize our results to document classes defined by non-trivial document grammars and to address the open questions in this light. Children, and even adults, were able to draw complex figures very quickly in the Logo language using the turtle as metaphor and guide. We hope that our use of caterpillars will garner a similar response from graphics designers."

[Received 7 January 2000; Accepted 15 February 2000]

For related presentations, see (1) Technical Report HKUST-TCSC-2000-02 = "Caterpillars, Context, Tree Automata and Tree Pattern Matching", by Anne Brüggemann-Klein and Derick Wood. "We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of context-dependent style sheets for HTML, SGML and XML documents, it can also be used for query specification for structured documents, as we shall demonstrate, and for the specification of computer program transformations. From a conceptual point of view, structured documents are trees, and one of the oldest and best-established techniques to process trees and, hence, structured documents are tree automata. We present a number of theoretical results that allow us to compare the expressive power of caterpillar expressions and caterpillar automata, their companions, to the expressive power of tree automata. In particular, we demonstrate that each caterpillar expression describes a regular tree language that is, hence, recognizable by a tree automaton. . ." [cache] and (2) Technical Report HKUST-TCSC-1998-04, by Anne Brüggemann-Klein, Stefan Hermann, and Derick Wood: "Context and Caterpillars and Structured Documents." - "We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are applying this technique in the specification of context-dependent style sheets for HTML, XML, and SGML documents, it is clear that it can be used in other environments such as query specification for structured documents and for computer program transformations. In addition, we present a number of theoretical results that allow us to compare the expressive power of caterpillar expressions to that of tree automata. . ." [cache]



[CR: 20000803]

Brüggemann-Klein, Anne; Wood, Derick. Caterpillars, Context, Tree Automata and Tree Pattern Matching. Technical Report HKUST-TCSC-2000-02. Clear Water Bay, Kowloon, Hong Kong: The Hong Kong University of Science and Technology, February 2000. 16 pages, 36 references.

Abstract: "We present a novel, yet simple, technique for the specification of context in structured documents that we call caterpillar expressions. Although we are primarily applying this technique in the specification of context-dependent style sheets for HTML, SGML and XML documents, it can also be used for query specification for structured documents, as we shall demonstrate, and for the specification of computer program transformations. From a conceptual point of view, structured documents are trees, and one of the oldest and best-established techniques to process trees and, hence, structured documents are tree automata. We present a number of theoretical results that allow us to compare the expressive power of caterpillar expressions and caterpillar automata, their companions, to the expressive power of tree automata. In particular, we demonstrate that each caterpillar expression describes a regular tree language that is, hence, recognizable by a tree automaton. Finally, we employ caterpillar expressions for tree pattern matching. We demonstrate that caterpillar automata are able to solve tree-pattern-matching problems for some, but not all, types of tree inclusion that Kilpeläinen investigated in his PhD thesis. In simulating tree pattern matching with caterpillar automata, we reprove some of Kilpeläinen's results in a uniform framework. This report will appear in the post-proceedings of the DLT '99 conference published by World Scientific Publishing Co., Singapore." [Submitted to World Scientific December 09, 1999.]

The document is available online. See also Anne Brüggemann-Klein and Derick Wood, "Caterpillars: A Context Specification Technique." Also: Technical Report HKUST-TCSC-1998-04, by Anne Brüggemann-Klein, Stefan Hermann, and Derick Wood: "Context and Caterpillars and Structured Documents." [cache]



Brüggemann-Klein, Anne; Wood, Derick. Deterministic Regular Languages. Bericht 38. Freiburg: Universität Freiburg, Institut für Informatik, Oktober 1991. 18 pages.

Abstract: The ISO Standard for Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard the right hand sides of productions are called content models and they are based on regular expressions. The allowable regular expressions are those that are "unambiguous" as defined by the standard. Unfortunately, the standard's use of the term "unambiguous" does not correspond to the two well known notions, since not all regular languages are denoted by "unambiguous" expressions. Furthermore, the standard's definition of "unambiguous" is somewhat vague. Therefore, we provide a precise definition of "unambiguous expressions" and rename them deterministic regular expressions to avoid any confusion. A regular expression E is deterministic if the canonical ε-free finite automaton M[subscript]E[/subscript] recognizing L(E) is deterministic. A regular language is deterministic if there is a deterministic expression that denotes it. We give a Kleene-like theorem for deterministic regular languages and we characterize them in terms of the structural properties of the minimal deterministic automata recognizing them. The latter result enables us to decide if a given regular expression denotes a deterministic regular language and, if so, to construct an equivalent deterministic expression.



[CR: 19961019]

Brüggemann-Klein, Anne; Wood, Derick. "Deterministic Regular Languages." Pages 173-184 in STACS '92: Proceedings of the 9th Annual Symposium on Theoretical Aspects of Computer Science (Cachan, France, 13-15 February 1992.) Edited by A. Finkel and M. Jantzen. Lecture Notes in Computer Science, 577. Berlin: Springer Verlag, 1992. ISBN: 3-540-55210-3.

For a Postscript version, try FTP from Freiburg; [mirror copy]. Author's address: Anne Brueggemann-Klein, Institut fuer Informatik, Universitaet Freiburg, Rheinstrasse 10-12, 79104 Freiburg, Germany, email: brueggem@informatik.uni-freiburg.de. Abstract: (see the TR version).



Brüggemann-Klein, Anne; Wood, Derick. Electronic Style Sheets. Technical Report [UWO] 350. London, Ontario: Department of Computer Science, University of Western Ontario, London, Ontario, 2 March 1993. 12 pages, bibliography.

Abstract: Document processing systems must provide formatted versions of documents, where the specification of formats is the task of the document designer. To match the stylistic quality expected in the traditional publishing process, electronic style sheets need to support the design mechanisms that have evolved over the centuries. The designer's craft should not depend on the formatter,in particular it should not involve programming the formatter. We propose four basic mechanisms called transcription types that are sufficient to express a wide range of layouts. Building on these four transcription types, we have defined a layout specification language, Designer,that is declarative and formatter-independent.

Supported under NSERC and ITRC grants of Derick Wood. The paper is available via FTP to UWO; ftp://ftp.csd.uwo.ca/pub/csd-technical-reports/350/. Authors' addresses: Anne Brueggemann-Klein, Institut fuer Informatik, Universitaet Freiburg, Rheinstrasse 10-12, D-7800 Freiburg, Germany, email:brueggemann@informatik.uni-freiburg.de; Derick Wood, Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada. Email: dwood@csd.uwo.ca.



[CR: 19980410]

Brüggemann-Klein; Wook, Derick. "One-Unambiguous Regular Languages." Information and Computation [Academic Press, Inc.] 140/2 (February 1 1998) 229-253 (with 29 references). Author's affiliation: Institut für Informatik, Technische Universität München, Germany.

Abstract: "The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book et al. (1971, IEEE Trans. Comput. C-20(2), 149-53), each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a one-symbol lookahead; we call such regular expressions 1-unambiguous. A regular language is a 1-unambiguous language if it is denoted by some 1-unambiguous regular expression. We give a Kleene theorem for 1-unambiguous languages and characterize 1-unambiguous regular languages in terms of structural properties of the minimal deterministic automata that recognize them. As a result we are able to prove the decidability of whether a given regular expression denotes a 1-unambiguous language; if it does, then we can construct an equivalent 1-unambiguous regular expression in worst-case optimal time."

See also: Gerard Berry and Ravi Sethi. From regular expressions to deterministic automata. Theoretical Computer Science, 48 (1):117-126, 1986.; Chia-Hsiang Chang and Robert Paige. From regular expressions to DFA's using compressed NFA's. Theoretical Computer Science, 178 (1-2):1-36, 30 May 1997. Fundamental Study; Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM, 11 (4):481-494, October 1964; Djelloul Ziadi. Regular expression for a language without empty word. Theoretical Computer Science, 163 (1-2):309-315, 30 August 1996.



Brüggemann-Klein, Anne; Wood, Derick. Unambiguous regular expressions and SGML document grammars. Technical Report # 337. London, Ontario: Department of Computer Science, University of Western Ontario, London, Ontario, 12 November 1992. 21 pages, bibliography. ISBN: 0771414544.

Abstract: The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions; although only expressions that denote words unambiguously are allowed. In general, the fact that a word is denoted by an expression is witnessed by a sequence of occurrences of symbols in the expression that matches the word. In an unambiguous expression as defined by Book, Even, Greibach, and Ott, each word has at most one witness. But the SGML standard also requires that a witness can be computed incrementally from the word with a one-symbol lookahead; we call such expressions 1-unambiguous. A regular language is 1-unambiguous if it is denoted by some 1-unambiguous expression. We give a Kleene theorem for 1-unambiguous languages and characterize them in terms of structural properties of the minimal deterministic automata that recognize them. This result enables us to decide whether a given regular expression denotes a 1-unambiguous language; if it does, then we can construct an equivalent 1-unambiguous expression in worst-case optimal time.

The paper is available via FTP to UWO; ftp://ftp.csd.uwo.ca/pub/csd-technical-reports/337/. Authors' address: Anne Brueggemann-Klein, Institut fuer Informatik, Universitaet Freiburg, Rheinstrasse 10-12, D-7800 Freiburg, Germany; Derick Wood, Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada.



[CR: 19970531]

Brüggemann-Klein, Anne; Wood, Derick. "The Validation of SGML Content Models.." Mathematical and Computer Modelling 25/4 (February 1997) 73-84 (with 27 references). Authors' affiliation: Institut für Informatik, Technische Universität München, Germany.

Abstract: The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic metalanguage for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only one problem raised by the standard, namely: in SGML, the right hand sides of context free productions are regular expressions, called content models, that are restricted to be what the standard calls "unambiguous," but what is more appropriately called deterministic. We solve the problem of how to define determinism precisely, how to recognize deterministic regular expressions efficiently, and how to recognize deterministic regular languages. Any SGML parser must check that a given document grammar conforms to the standard; that is, it must validate it. Hence, our results are an important step in the clarification of the standard and in the efficient implementation of an SGML parser for SGML document grammars."

See the related entry, with links to a Postscript version of the document.



Brüggemann-Klein, Anne; Wood, Derick. The validation of SGML content models. Technical Report # 355. London, Ontario: Department of Computer Science, University of Western Ontario, London, Ontario, 21 March 1993. 15 pages, 13 references. ISBN: 0771415028.

Abstract: "The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic meta-language for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only one problem raised by the standard, namely: In SGML, the right-hand sides of context-free productions are regular expressions, called content models, that are restricted to be what the standard calls ``unambiguous,'' but what is more appropriately called deterministic. We solve the problem of how to define determinism precisely, how to recognize deterministic regular expressions efficiently, and how to recognize deterministic regular languages. Any SGML parser must check that a given document grammar conforms to the standard; that is, it must validate it. Hence, our results are an important step in the clarification of the standard and in the efficient implementation of an SGML parser for SGML document grammars."

To appear in Mathematical and Computer Modelling, 1996. The paper is available via FTP to UWO: ftp://ftp.csd.uwo.ca/pub/csd-technical-reports/355/; [mirror copy]. Or try FTP from Freiburg; [mirror copy]. Authors' address: Anne Brueggemann-Klein, Institut für Informatik, Universitaet Freiburg, Rheinstrasse 10-12, D-7800 Freiburg, Germany; Derick Wood, Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada.



Brüggemann-Klein, Anne; Wood, Derick. "On the Expressive Power of SGML Document Grammars." "In preparation". 1991.

Brüggemann-Klein, Anne; Wood, Derick. "Parser Generators for Document Grammars." "Submitted for publication." 1991.



[CR: 19980603]

Brüggemann-Klein, Anne; Wood, Derick; Murata Makoto. Regular tree languages over non-ranked alphabets. Technical Report, Version 0.3, April 19, 1998. München: Institut für Technische Universität München, April 19 1998. Extent: 21 pages, 10 references. Authors' affiliation: [Brüggemann-Klein]: Institut für Technische Universität München, Arcisstrasse 21, 80290 München, Germany; Email: brueggem@informatik.tu-muenchen.de; WWW: http://www11.informatik.tu-muenchen.de; [Wood]: Department of Computer Science, Hong Kong University of Science & Technology, Clear Water Bay, Kowloon, Hong Kong; Email: dwood@cs.ust.hk; WWW: http://www.cs.ust.hk/~dwood; [Murata]: Fuji Xerox Information Systems; Email: murata@apsdc.ksp.fujixerox.co.jp; WWW: http://www.geocities.com/ResearchTriangle/Lab/6259/ .

Summary: The April version is a draft: "There is a preliminary version of the tree automata paper available at ftp://ftp11.informatik.tu-muenchen.de/pub/misc/caterpillars/. It is a mere skeleton yet, mostly definitions, theorems, and proofs. The paper has one more author, namely Makoto Murata, who is not mentioned yet on the title page. We are going to add the obligatory introduction and other text plus a new section on forest automata." [May 18, 1998, A.B-K.]

Available in Postscript format; archive copy, 980529. A TeX version is also available in the FTP directory. The final section on "Previous work" surveys literature used by the authors: "Work on tree automata and tree-regular languages can be divided into two categories, one dealing with ranked and the other with non-ranked alphabets. . ." An accompanying bibliography (in BiBTeX format) provides a larger collection of references pertaining to tree/forest automata; see the FTP directory cited, or the .ZIP archive file. See the database entry: SGML/XML and Forest Automata Theory.



[CR: 19980907]

Brugger, Rolf; Bapst, Frédéric; Ingold, Rolf. "A DTD Extension for Document Structure Recognition." Pages 343-354 (with 14 references) in Electronic Publishing, Artistic Imaging, and Digital Typography. Proceedings of the 7th International Conference on Electronic Publishing (EP '98), Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, RIDT '98). EP '98 and RIDT '98, Saint Malo, France. March 30 - April 3, 1998. Edited by Roger D. Hersch, Jacques André, and Heather Brown. Lecture Notes in Computer Science Series, Number 1375. New York/Berlin/Heidelberg: Springer-Verlag, 1998. ISBN: 3-540-64298-6, and 3-540-64298-6. Authors' affiliation: Informatics Institute of the University of Fribourg, Chemin du Musée 3, Fribourg, Switzerland; Email: [Rolf Brugger] rolf.brugger@unifr.ch, WWW: http://www-iiuf.unifr.ch/~brugger/.

Abstract: "The paper deals with the representation of document models used in the field of document recognition. A novel formalism called generalized n-gram is presented, which is shown to be accurate for the recognition task and well adapted to automatic learning by example. The paper addresses also the thorny problem of integrating models for document analysis with existing standards used for document manipulation and production. . . The benefits of using high level descriptions to handle structured documents have been widely recognized by the scientific community dealing with electronic document production. The SGML language providing the DTD mechanism has become increasingly important during the last five years. The document analysis and recognition community has been dealing with structure recognition for several years too. In this context, the use of general knowledge, often referred to as document models, describing the document class to be processed is essential; it is clear that these document models share many common features with DTDs, at least the generic logical structure. Very little work has been done so far in order to combine both kinds of formalism. This paper aims to make a significant contribution to reduce the gap between document models used for recognition purposes and document type definitions commonly used for structured document handling. . ."

[Conclusion:] This paper has dealt with document models used in the field of document analysis and recognition. A new generic model based on generalized n-grams has been presented: it has been shown to be accurate enough to recognize logical structures and suitable for easy incremental learning. The problem of integration of this model with the SGML/DSSSL world has also been addressed. In fact, it has been shown that the generalized n-grams model can be used to generate DTDs in SGML and document style definitions in DSSSL automatically. The problem of translating the statistical knowledge remains partially open and needs further investigations. We are convinced that the use of common tools for document recognition and document production can notably increase the capacity of both domains. First DTDs and style descriptions bring necessary knowledge to the recognition process; second document models learned from examples and translated to SGML may reduce the cumbersome task of defining DTDs and DSSSL descriptions by hand."

The document is online in HTML format. See also the online abstract and the full text in PDF, [local archive copy]. Also the document in Postscript format, [local archive copy].

The CIDRE project (Cooperative & Interactive Document Reverse Engineering) for the "Development of a cooperative and interactive document recognition environment" has produced a number of publications relative to document structure and DTD generation. See also, for example, "Modeling Documents for Structure Recognition Using Generalized N-Grams," by Rolf Brugger, Abdelwahab Zramdini, and Rolf Ingold (IEEE, ICDAR'97); Postscript version, [local archive copy, postscript]. For a list of other related publications and papers, see Brugger's home page, or a documents snapshop listing.



[CR: 19971017]

Bruneseaux, Florence; Romary, Laurent. "Codage des références et coréférences dans les dialogues homme-machine." Pages 15 - 17 in ACH-ALLC '97. The 1997 Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. Conference Abstracts. ACH-ALLC '97. Queen's University at Kingston, Ontario, Canada. June 3 - 7, 1997. Compiled by Greg Lessard and Michael Levison. Ontario, Canada: Queen's University, 1997. ISBN: 0-88911-760-8. Authors' affiliation: CRIN-CNRS&INRIA Lorraine; Email: brunesea@loria.fr and romary@loria.fr, WWW: L. Romary Home Page.

[Extract:] "Les avantages d'une normalisation des ressources textuelles en format électronique par l'utilisation de la TEI (Text Encoding Initiative) ont déjà été présentés dans de nombreux articles. Cette application de la norme SGML propose en effet des directives pour le codage des textes en offrant plus de 500 éléments (et autant d'attributs) permettant de décrire un document. Nous voudrions nous intéresser ici à un type de document particulier, les dialogues homme-machine multimodaux (parole et geste). Un codage de base pour l'ensemble des dialogues, et qui peut être généralement réalisé automatiquement à partir d'une transcription initiale correcte, doit mettre en évidence un certain nombre d'informations parmi lesquelles le locuteur de chaque énoncé, les changements de tours de parole, les pauses. . . A partir de ce codage stable et indépendant du type d'étude que l'on voudra réaliser, il serait souhaitable de mettre en évidence des phénomènes plus spécifiques au niveau du contenu. Parmi ceux-ci, on peut considérer le problème de la référence et plus généralement le problème de la relation pouvant exister entre différents types de syntagmes syntagmes (nominaux et verbaux). En effet, si on analyse un dialogue entre deux individus, il est important de pouvoir dire si un segment de discours réfère à un objet particulier et si son interprétation peut être faite directement ou si celle-ci dépend d'un autre segment. . ."

Abstract available online in HTML format: "Codage des références et coréférences dans les dialogues homme-machine", by Florence Bruneseaux, Laurent Romary; [archive copy]

Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server.



[CR: 19971123]

Bryan, Martin. "CD 13250: SGML Applications - Topic Navigation Maps." Page(s) 263-265 in SGML '97 Conference Proceedings. SGML Europe '97. "The Next Decade - Pushing the Envelope." Princesa Sofia Intercontinental, Barcelona, Spain. 11-15 May, 1997. Sponsored by Graphic Communications Association (GCA) and SGML Open. Conference Chair: Pamela L. Gennusa (Director, Database Publishing Systems Ltd). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 342 pages, CDROM. Author's affiliation: Consultant, The SGML Centre; Email: mtbryan@sgml.u-net.com.

Summary: The Topic Navigation Maps standard "provides a mechanism, based on techniques defined in ISO/IEC 10744:1992, for identifying information objects that share a common topic. It can also be used to define the relationships between sets of related topics. This standard can, for example, be used to define: 1) tables of contents and subject indexes for individual documents, or related sets of documents; 2) glossaries that can be shared by more than one document; 3) the relationship between topics within a thesaurus; 4) the relationships between multilingual thesauri, glossaries, etc."

Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry. See also the dedicated section on Topic Navigation Maps in the SGML/XML Web Page, and the Topic Maps draft document(s).



Bryan, Martin. "Creating Informative Document Models." SGML Users' Group Newsletter 20 (September 1991) 12-17.



[CR: 19951228]

Bryan, Martin. "From WWW to WW to W?" OII Spectrum ?? (November 1995) ??. Author's affiliation: Martin Bryan is an Information Management Consultant based in Churchdown, Gloucestershire. The SGML Centre, 29 Oldbury Orchard, Churchdown, Glos. GL3 2PU, UK. Email: mtbryan@sgml-cen.demon.co.uk.

"This article explains why HTML, SGML and HyTime form a natural progression in terms of information management, and how the three techniques can be used to compliment one another."

Available online in HTML format (draft version): [mirror copy, December 1995].



[CR: 19970717]

Bryan, Martin. HTML and SGML Explained. Second Edition.. White Plains, NY: Addison-Wesley Developers Press, 1997. Extent: 352 pages, CDROM disc. ISBN: 0-201-40394-3 [$39.95]. Author's affiliation: Martin Bryan, The SGML Centre, 29 Oldbury Orchard, Churchdown, Glos GL3 2PU, UK. Phone/Fax: +44 1452 714029; Email: mtbryan@sgml.u-net.com. WWW: http://www.u-net.com/~sgml/.

"Fully updated to cover the latest features of SGML and HTML, this new edition of SGML: An Author's Guide now includes a detailed description of how the concepts of SGML are used in HTML. Building on the strengths of the previous edition, SGML and HTML Explained provides an accessible explanation of all the features provided by both languages through numerous practical examples. [The book features:] (1) Real world examples taken from the world of electronic publishing used throughout; (2) A detailed description of the new standard HTML3 DTD including forms, Tables and Multimedia; (3) An exploration of the SGML and HTML techniques that can be applied in document analysis and information modelling; (4) A CD-ROM containing an interactive HTML example to enable readers to explore the real effects of Hypertext Links and other active HTML features." [from the publisher's blurb]

The most novel feature of this book is that it's also available (substantially) online in HTML format; since the release of the book earlier in 1997, at least three chapters have been revised (online) to reflect changes based upon the HTML 4.0 DTD.



[CR: 19970314]

Bryan, Martin "The Missing Link." SGML Users' Group Bulletin 3/1 (1988) 1-7. ISSN: 0269-2538. Author's affiliation: Quorum Technical Services, Ltd.

The author endeavors to explain the SGML 'link statement' and its role in text formatting processes. "The link statement provides a natural route for formatting straightforward documents. Its power is such that all but the most complex of formatting tasks can be defined in this way."



Bryan, Martin. SGML: An Author's Guide to the Standard Generalized Markup Language. Wokingham/Reading/New York: Addison-Wesley, 1988. 380 pages. ISBN: 0-201-17535-5 (pbk); LC CALL NO: QA76.73.S44 B79 1988.

A highly detailed manual explaining and illustrating features of ISO 8879. According to the publisher, the book: (1) shows how to analyse the inherent structure of a document; (2) illustrates a wide variety of markup tags; (3) shows how to design your own tag set; (4) is copiously illustrated with practical examples; (5) covers the full range of SGML features. Technical and non-technical authors, publishers, typesetters and users of desktop publishing systems will find this book a valuable tutorial on the use of SGML and a comprehensive reference to the standard. It assumes no prior knowledge of computing or typography on the part of its readers. See further description in a publisher's blurb.



Bryan, Martin. "Document Markup for Open Information Interchange." Pages 3/1-3/3 in Proceedings of the IEE Colloquium on 'Adding Value to Documents with Markup Languages' London/Stevenage, UK, 6 June 1994. IEE Colloquium Digest No.1994/142. London, UK: IEE, 1994.

Abstract:The principal goal of Open Information Interchange is to allow data generated by one application to be displayed, referenced, and/or copied to another application. It is not postulated on a need for access to the application that originally generated the data. Instead the data itself contains or is associated with, all the information necessary for the processing of the data. Interchanging documents that make reference to other documents raises problems like to ensure that all files referred to from one document are made available whenever the document is transmitted to a new location. The techniques discussed are the indication of the trend for future text processing systems.



[CR: 19950716]

Bryan, Martin. "International Standards Activity Related to SGML, February 1995." SGML Users' Group Newsletter 30 (March 1995) 4. ISSN: 0952-8008.

Report on the February meeting of ISO's JTC1/SC18/WG8. Topics included: DSSSL, SPDL, SDQL (SGML Document Query Language), SDIF, MIME, SMSL (Standard Multimedia Scripting Language), font interchange (ISO 9541).



[CR: 19950716]

Bryan, Martin. "Report on ISO/IEC/ITU Conference on International Standards for Multimedia/Hypermedia Systems." SGML Users' Group Newsletter 29 (November 1994) 8-16. ISSN: 0952-8008. Author's affiliation: The SGML Centre, Tel. +44-1452-714029

Report on the ISO/IEC/ITU conference on September 13, 1994.



Bryan, Martin. "Standards for Text and Hypermedia Processing." Information Services and Use [Workshop on Hypermedia and Hypertext Standards, Amsterdam, Netherlands, 22-23 April 1993. Sponsored by CEC.] 13/2 (1993) 93-102. 11 references.

Abstract: Working Group 8 of ISO/IEC Joint Technical Committee 1 Subcommittee 18 (JTC1/SC18/WG8) is tasked with developing information technology (IT) standards for use in text and office systems. The Standard Generalized Markup Language (SGML) introduced by WG8 in 1986 is one of the key standards in developing systems for open information interchange (OII). In November 1992 WG8 published an important new standard, based on SGML, for the interchange of multimedia and hypermedia data. The Hypermedia/Time-based Structuring Language (HyTime) provides hypermedia and multimedia systems developers with a standardized way of representing their data sets when interchanging information with other systems. Since the publication of the HyTime standard, WG8 have started work, in conjunction with SC29/WG12, on the development of a standard multimedia scripting language (SMSL). Based on the concepts behind HyTime, SMSL will enable system developers to interchange compiled forms of information flow scripts, probably using the UK-developed architecture neutral distribution format (ANDF).



Bryan, Martin. "A TeX User's Guide to ISO's Document Style Semantics and Specification Language." TUGboat [Proceedings of the 1993 Annual Meeting] 14/3 (1993) 223-226.



[CR: 19971206]

Bryan, Martin. "Topic Navigation Maps - An Overview." International SGML Users' Group Newsletter 3/4 (October 1997) 8-11. ISSN: 0952-8008. Author's affiliation: The SGML Centre, Churchdown, Gloucestershire; Email: mtbryan@sgml.u-net.com.

Summary: "ISO's Topic Navigation Map standard (ISO 13250) provides facilities for creating, maintaining and interchanging topic-based navigational aids to large corpora of documents containing inter-related information. The standard makes a distinction between the highly concentrated and independent topic navigation maps - sets of relations between the topics covered in a given corpus - defined within this standard and the addresses of relevant information within the corpora themselves, which are typically defined using facilities provided by ISO/IEC 10744, which defines the Hypermedia/Time-based Structuring Language known as HyTime. Topic navigation maps can improve the accessibility of information by facilitating, and to some extent automating, the task of providing navigational resources. Topic navigation maps are designed to simplify groupware-supported production of data for which navigational aids such as indexes, glossaries, tables of contents, lists and catalogs need to be generated. Topic navigation maps can also be used to enhance the navigability of very large information bases by providing in-depth sub-categorization of terminology bases."

For further information, see Topic Navigation Maps.



[CR: 19960715]

Buford, John F. "Evaluating HyTime: An Examination and Implementation Experience." Pages 105-15 (with 27 references) in Proceedings of Hypertext '96. Hypertext '96. Seventh ACM Conference on Hypertext. Washington, DC, USA. March 16-20, 1996. Sponsored by ACM SIGLINK and SIGOIS. New York: Association for Computing Machinery, 1996. ISBN: 0897917782. Author's affiliation: Distributed Multimedia Systems Laboratory, Department of Computer Science, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854 USA. URL: http://dmsl.cs.uml.edu. E-mail: buford@cs.uml.edu.

Abstract: "HyTime defines an extensive meta-language for hypermedia documents, including general representations for links and anchors, a framework for positioning and projecting arbitrary objects in time and space, and a structured document query language. We propose a set of criteria for evaluating the HyTime model. We then review the model with respect to these criteria and describe our implementation experience. Our review indicates both the benefits and limitations of HyTime. These results are relevant to systems and applications designers who are considering HyTime, and also to possible future revisions of the standard."

An online version of the document is available in HTML format: http://www.cs.unc.edu/~barman/HT96/P69/buford.htm [mirror copy, text only].



[CR: 19950716]

Buford, John F.; Rutledge, L.; Rutledge, J. L. "Integrating Object-oriented Scripting Languages with HyTime." Pages 425-434 in Proceedings of the International Conference on Multimedia Computing and Systems [IEEE International Conference on Multimedia Computing and Systems, Boston, MA, USA, 15-19 May, 1994. Sponsored by IEEE Computer Society Task Force on Multimedia Computing.] Los Alamitos, CA: IEEE Computer Society Press, 1994. Authors' affiliation: Department of Computer Science, Massachusetts University, Lowell, MA, USA.

"Abstract: HyTime provides a comprehensive set of primitives for composing hypermedia documents, but does not provide facilities for representing interaction or dynamic behavior, areas which are required in commercial multimedia authoring environments. In previous work we have developed and implemented a prototype HyTime engine called HyOctane in which HyTime interactive multimedia documents can be stored and retrieved. We extend this work to include scripting language facilities, an area currently not dealt with by the standard but which is crucial in order to represent interactive multimedia documents. We compare different approaches to integrating HyTime document type definitions with scripts and describe the extensions to our engine architecture and implementation status."



Bullard, Len (editor). Metafile for Interactive Documents (MID); A Draft Specification for the Encoding of Interactive Documents. With Eric L. Jorgensen (CDNSWC Code 192, Project Director) and other members of the MID development team. Bethesda, MD: Naval Surface Warfare Center, Carderock Division, November 1994. 119 pages.

Summary: "This draft of the MID Specification has been prepared for purposes of review and comment by the general National and International technical community interested in standards for Interactive Electronic Documents which require a mechanism (i.e., script) for controlling the presentation of text, graphics, and other multimedia information developed for electronic display. It is written as an application of ISO 8879 SGML and utilizes portions of the ISO 10744 HYTIME extensions to SGML. While it was initiated by the Navy for purposes of developing a run-time standard for DoD Interactive Electronic Technical Manuals (IETMs), the MID Standard has been intentionally developed to be suitable for inclusion in an International-Level standard and to be applicable to generic scripted interactive documents of any nature and for any application. The Navy point of contact is Eric Jorgensen, CDNSWC Code 182, email: jorgense@oasys.dt.navy.mil."

The document is available in several formats via anonymous FTP: to NavySGML, or via HTTP connection to the NAVY DTD/FOSI Repository. See the full text of the announcement for other details.



[CR: 19960408]

Burger, Franz; Reich, Sigfried. "Design and Implementation of an Abstract SGML Interface in Smalltalk." Computer Standards & Interfaces 18/1 (January 1996) 71-78 (with 16 references). ISSN: 0920-5489. Authors' affiliation: [Burger]FAW [Forschungsinstitut für Anwendungsorientierte Wissensverarbeitung] - Research Institute for Applied Knowledge Processing, A 4232 Hagenberg, Austria. Email address: franz@faw.uni-linz.ac.at; [Reich] University of Linz, Department of Information Science, A 4040 Linz, Austria.

Abstract: "SGML is a key standard for the publishing industry. However, tools being able to deal with SGML documents, for instance editors or document management systems, still are not a great many. This paper describes the design and prototypical implementation of an abstract SGML interface in Smalltalk which is dedicated to be used for the development of SGML-aware applications. Motivation and background for the work presented here are given. The structure of SGML is analyzed and the corresponding Smalltalk class hierarchy is derived. Two example applications demonstrate the usability and power of the interface."

This article was published in an SGML special issue of Computer Standards & Interfaces [The International Journal on the Development and Application of Standards for Computers, Data Communications and Interfaces], under the issue title SGML Into the Nineties. It was edited by Ian A. Macleod, of Queen's University. A draft version is available on the Internet in Postscript format: ftp://ftp.ifs.uni-linz.ac.at/pub/publications/1995/0395.ps.gz [mirror copy, March 1996]. Franz Burger's home page contains links to papers of similar nature.



[CR: 19960818]



Burkowski, Forbes J. "The Use of Retrieval Filters to Localize Information in a Hierarchicaly Tagged Text-Dominated Database." Pages 264-284 (with 11 references) in Intelligent Text and Image Handling: Proceedings of a Conference on Intelligent Text and Image Handling, "RIAO91" Barcelona, Spain, 2-5 April, 1991 [Conference organized by the Centre de Hautes Etudes Internationales d'Informatique Documentaire (CID), Center for the Advanced Study of Information Systems, Inc. (CASIS). Sponsored by the Commission of the European Communities, Minister of Education and Sciences, Spain; Minister of "Industrie en Aménagement du Territoire", France; et al.] Edited by André Lichnerowicz [Collège de France, Académie des Sciences de Paris]. Amsterdam/London/New York/Tokyo: Elsevier, 1991. xiii + 999 pages. ISBN: 0-444-89361-X. Department of Computer Science, University of Waterloo, Waterloo, Canada.



[CR: 19980413]

Burman, Linda. "[Taking the Pulse at] Microsoft's Web Tech Ed Conference [January 26-28, 1998. Letter from XMLland]." XML Files: The XML Magazine Issue 04 (March 17, 1998) 14-19. Author's affiliation: President, L. A. Burman Associates..

Summary: "Gathered in Palm Springs were 3500 Web developers and tool vendors eagerly awaiting the keynote guaranteed to deliver hot news of new Microsoft technologies and strategic direction. . . Overall there was no question that XML captured the imagination and interest of the mainstream Web community. As a result of the conference, Web site developers know where to get more information and how to get started. They understand that tools to support this technology are developing rapidly. . . For companies like Microsoft, the important applications will be data driven because that is an area of huge opportunity. The idea of a 'universal data format' is actually doable with XML."

Available online.



[CR: 19961226]

Burman, Linda. "SGML Editors and Authoring Systems." Page 51 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: L. A. Burman Associates, 23 Hambly Ave., Toronto, ON M4E 2R5, CANADA; Tel: 416 699-7198; FAX: 416 699-1178; Email: linda@interlog.com.

Abstract: "A variety of SGML authoring and editing tools exist on the market today and new ones are being added all the time. Initially, there seemed to be the need for only one type of tool but as a result of market need there are now a number of different 'flavors' each best suited for a particular SGML application.

This session will discuss the role of SGML authoring within a total publishing system. It will also describe the various types of tools available today for editing and authoring and what broad category each fits into in terms of its 'flavor'. A list of all known authoring and editing tools will be provided."

Note: The above presentation was part of the "SGML Newcomer" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19950823]

Burnage, Gavin; Dunlop, Dominic. "Encoding the British National Corpus." Pages 79-95 in English Language Corpora: Design, Analysis andExploitation. Ed. Jan Aarts, Pieter de Haan and Nelleke Oostdijk. Amsterdam and Atlanta: Editions Rodopi, 1993.

See the main BNC entry for other details on the use of SGML encoding in this project.



[CR: 19960922]

Burnard, Lou (guest editor). Electronic Texts and the Text Encoding Initiative. A Special Issue [5.3] of 'TEXT Technology: The Journal of Computer Text Processing. Madison, SD: College of Liberal Arts, Dakota State University, [George M. and Merrill D. Hunter Electronic Publishing Center], Autumn 1995. ISSN: 1053-900X. Author's affiliation: Oxford University Computing Services, Oxford; TEI editor.

This special of issue (Autumn 1995) of TEXT Technology has contributions by: Eric Johnson; Lou Burnard [this entry, and the "Introduction"]; Jeffery Triggs; John Price-Wilkin; Susan E. Kruse; Laurent Romary, Nathalie Mehl, and David Woolls; James K. Tauber; Syd Bauman; C. M. Sperberg-McQueen. Lou Burnard introduces articles in the special issue with the "Introduction" (pages 176-178). See fuller description: Something new under the Sun: Electronic Texts and the Text Encoding Initiative [mirror copy w/ text only] For background and TEI summary on the same server: The Text Encoding Initiative (by Eric Johnson) [ mirror copy w/ text only]. Contact via email (Eric Johnson): text.technology@columbia.dsu.edu OR JohnsonE@columbia.dsu.edu; Surface address: TEXT Technology; 114 Beadle Hall; Dakota State University; Madison, SD 57042-1799 U.S.A.; Phone: (605) 256-5270.

For other journal special issues and monographs dedicated to the Text Encoding Initiative, see the relevant subentry for TEI.



Burnard, Lou. "An Introduction to the Text Encoding Initiative." Pages 81-91 in Modelling Historical Data: Towards a Standard for Encoding and Exchanging Machine-Readable Texts. Edited by Daniel Greenstein. Halbgraue Reihe zur Historischen Fachinformatik, Serie A, Historische Quellenkunden, edited by Manfred Thaller, Band (A) 11. St. Katharinen: [Published for the Max-Planck-Institut für Geschiche, Göttingen by] Scripta Mercaturae Verlag, 1991. iv + 223 pages. ISBN: 3-928134-45-0.

The article supplies an historical overview of the TEI, a survey of basic principles, and a dicsussion of the contents of the draft Guidelines. For other volume information (additional articles related to TEI-SGML), see sub the editor, Daniel Greenstein below.



[CR: 19950716]

Burnard, Lou. "Rolling Your Own with the TEI [Text Encoding Initiative]." Information Services and Use [Workshop on Hypermedia and Hypertext Standards, Amsterdam, Netherlands, 22-23 April 1993. Sponsored by CEC.] 13/2 (1993) 141-154. Author's affiliation: Computing Services, Oxford University, Oxford, UK.

"Abstract: An introduction is given to the scope and development of the Text Encoding Initiative recommendations, due for publication in July 1993. A brief technical overview of the scheme's modular architecture is given, paying particular attention to the proposals for representation of pointers and links, which may be regarded as a 'poor man's Hytime'."



[CR: 19970922]

Burnard, Lou. "SGML on the Web: Too Little Too soon, or Too Much Too Late?" In: Proceedings of the 3rd Annual Conference on the Practical Use of SGML. "A Decade of Power." Third Annual [Belux] Conference on the Practical Use of SGML. Business Faculty, Sint-Lendriksborre 6, Brussels, Belgium. October 31, 1996. Sponsored by SGML Belux (Belgian-Luxembourg Chapter of the International SGML Users' Group). Leuven, Belgium: Belux, 1996. Author's affiliation: Oxford University Computing Services. Email: lou@vax.oxford.ac.uk.

Summary: [from Lou Burnard's conference report] "As far as I remember, I explained at some length why HTML was a Bad Thing for electronic publishers (this is what the Americans call preaching to the choir), and rather more briefly why SGML was a Bad Thing for the Websurfer in the Street (which is probably what the Americans call making waves). I also made a few incautious remarks about what XML might be when it finally hits the street next month, which provoked some interest." See L. Burnard's Report on "A decade of power": the annual meeting of the Belgian-Luxembourgian SGML Users Group.

Available online in HTML format: SGML on the Web: too little too soon, or too much too late?, by Lou Burnard; [mirror copy]. Or: available online from OUCS. For further information on the conference, see: (1) the description in the conference announcement and call for papers, and (2) the full program listing, or (3) the main conference entry in the SGML/XML Web Page.

Also published in an abridged format by CTI Textual Studies as: "SGML on the Web: Too Little Too Soon, or Too Much Too Late." Computers & Texts 15 (August 1997). URL: http://info.ox.ac.uk/ctitext/publish/comtxt/ct15/burnard.html. ("This paper is adapted from a presentation at 'A Decade of Power', the third annual conference of the Belgium-Luxembourg SGML Users Group, held in Brussels 30-31 October 1996.") [archive copy, text only]



[CR: 19951113]

Burnard, Lou. Text Encoding for Information Interchange: An Introduction to the Text Encoding Initiative. TEI [Text Encoding Initiative] Document no. TEI J31. Oxford: Oxford University Computing Services , July 1995. Extent: approximately 22 pages. Author's affiliation: Lou Burnard is European Editor for the Text Encoding Initiative, and in affiliated with Oxford University Computing Services.

The document is available in HTML format and as an SGML document. The HTML version "was derived automagically from the version prepared in TEI Lite format for presentation at the Second Language Engineering Conference, London, October 1995." An earlier version of the paper was published as "The Wider Relevance of the Text Encoding Initiative" in OII Spectrum, November 1994.

Table of contents

  • 1 Standardization and the TEI
  • 2 What is the TEI?
  • 3 Organization of the TEI scheme
  • 4 The TEI core

    • 4.1 Elements available to all bases
    • 4.2 The header

  • 5 The TEI base tag sets

    • 5.1 Textual Divisions
    • 5.2 The TEI Class System and Modification Mechanisms
    • 5.3 The global attributes

  • 6 The TEI additional tag sets
  • 7 From General to Specific
  • 8 Conclusions



[CR: 19961226]

Burnard, Lou. "Using SGML for Linguistic Analysis: The Case of the BNC." Pages 95-106 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Author's affiliation: Oxford University Computing Services, Humanities Computing Unit, 13 Banbury Road, Oxford, OX2 6NN, UK; Email: lou.burnard@oucs.ox.ac.uk; WWW: http://users.ox.ac.uk/~lou.

Abstract: "The British National Corpus (BNC) is a rather large SGML document, comprising some 4124 samples taken from a rich variety of contemporary British English texts of every kind, written and printed, famous and obscure, learned and ignorant, spoken and written. Each of its hundred million words and six and a quarter million sentences is tagged explicitly in SGML and carries an automatically-generated linguistic analysis. Each sample carries a TEI-conformant header, containing detailed contextual and descriptive information, as well as more conventional SGML mark-up.

The corpus was created over a four year period by a consortium of leading dictionary publishers and academic research centres in the UK, with substantial funding from the British Department of Trade and Industry, the Science and Engineering Research Council, and the British Library. It is currently available under licence within the European Union only, where it is increasingly used in linguistic research and lexicography, in applications ranging from the construction of state of the art language-recognition systems, to the teaching of English as a second language.

This paper begins by describing how the corpus was constructed, and gives an overview of some of the SGML encoding issues raised during the process. A description of the special purpose SGML aware retrieval system developed to analyse the corpus is also provided."

See a longer abstract [mirror copy], and an online version of the SGML '96 presentation: Using SGML for Linguistic Analysis: the case of the BNC [mirror copy, pis aller, but see the canonical source if possible].

Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.



[CR: 19961216]

Burnard, Lou. "What is SGML and How Does It Help." The Text Encoding Initiative: Background and Contents, Guest Editors Nancy Ide and Jean Véronis = Computers and the Humanities 29/1 (1995) 41-50.

Abstract: SGML is an abbreviation for "Standard Generalized Markup Language". This language, or rather metalanguage, was first defined by an International Standard in 1986. To complement the many detailed technical descriptions of SGML now available, this paper briefly describes the purpose and scope of the [SGML] standard, aiming to persuade non-technically minded readers that it has something to offer them.

Available online: "What is SGML and How Does It Help?", from the canonical source, or unofficial local mirror copy.



Burnard, Lou. "What is SGML and How Does it Help?." Pages 65-79 in Modelling Historical Data: Towards a Standard for Encoding and Exchanging Machine-Readable Texts. Edited by Daniel Greenstein. Halbgraue Reihe zur Historischen Fachinformatik, Serie A, Historische Quellenkunden, edited by Manfred Thaller, Band (A) 11. St. Katharinen: [Published for the Max-Planck-Institut für Geschiche, Göttingen by] Scripta Mercaturae Verlag, 1991. iv + 223 pages. ISBN: 3-928134-45-0.

See other volume information sub the editor, Daniel Greenstein below. A revised copy of Burnard's article in tagged electronic format is available from the UICVM (TEI-L) LISTSERVer (listserv@uicvm on BITNET) as EDW25 DOC, October 1, 1991. Send a command to the LISTSERVer: get edw25 doc tei-l. The document is/was available in text format from the OTA FTP server. Or read an HTML copy (dated November 1994) mirrored on the SIL server.



[CR: 19971126]

Burnard, Lou; Gartner, Richard; Kidd, Peter. The Cataloguing of Western Medieval Manuscripts in the Bodleian Library: a TEI approach with an appendix describing a TEI-conformant manuscript description. Paper/Report, and 'Chapter of a book'. Oxford: [Bodleian Library?], August 1997. Extent: 38 pages. Authors' affiliation: [Burnard]: Humanities Computing Unit at Oxford University Computing Services, and Oxford Text Archive, Oxford University, Email: lou.burnard@oucs.ox.ac.uk; [Gartner, Kidd]: Bodleian Library, Oxford University.

Summary: "In January 1996 the Bodleian Library began a four-year project, funded by the Higher Education Funding Council for England (HEFCE), under the Non-Formula Funding Specialised Research Collections initiative, the purpose of which is to make available descriptions of the medieval western manuscripts acquired by the Library since 1916, for which no full published catalogue yet exists. There are three main series of published catalogues of the western manuscripts at the Bodleian Library: the so-called `Quarto' catalogues, published between 1845 and 1900, in quarto format, which cover the major collections acquired (for the most part) in the seventeenth and eighteenth centuries; the Summary Catalogue , published between 1895 and 1953, which covers the manuscripts acquired from 1602 to 1915, except those already described in the Quarto catalogues; and the Summary Catalogue of Post-Medieval Western Manuscripts , published in 1991, which covers most post-medieval manuscripts acquired between 1916 and 1975."

"This paper reports on some of the problems addressed by the project, primarily from the bibliographic point of view, together with the technical approaches we have adopted for their resolution. Our basic approach has been to build on existing work as far as possible, while at the same time seeking to develop a system adequate to the Bodleian's arguably rather specialist needs. In that spirit, we have developed a set of extensions to the Text Encoding Initiative (TEI) proposals for general purpose text encoding (Sperberg-McQueen 1994), tailored to the needs of manuscript cataloguers. A detailed appendix to the paper documents this set of extensions, as they are currently formulated."

The document is available in HTML format, generated from the TEI-Lite SGML source; also in RTF format; [mirror copy, RTF version].

See also Gartner, Burnard, and Kidd, "A TEI Extension for the Description of Medieval Manuscripts," Pages 73-76 in TEI 10: A Conference in Celebration of the Tenth Anniversary of the Text Encoding Initiative. For other information, see Information on the Bodleian Library's TEI Extensions for Manuscript Cataloguing.

Similarly: "Medieval manuscripts and metadata: SGML approaches to cataloguing at the Bodleian Library, by Peter Kidd, (Assistant Librarian, Dept. of Western Manuscripts, Bodleian Library, Oxford) = paper presented at DRH '97 (Digital Resources for the Humanities A Conference at St Anne's College, Oxford, 14 th - 17th September 1997). Extract: "The provision of descriptions of medieval manuscripts in electronic format presents numerous challenges, largely due to the fact that such physical objects present an almost bewildering variety of types, each of which has to be treated in a manner sympathetic to its own unique features. Any system capable of describing items which encompass such variety, will require both great flexibility, and a degree of discipline enforced by its structure. To meet this challenge, the Bodleian Library has been exploring the potential of two SGML DTDs, each of which appears to be capable of fulfilling the demands of the material, in different ways. [...] For a much more intensive level of encoding, allowing more subtle and powerful search potential, a set of extensions to the TEI DTD is being developed, to facilitate the incorporation of detailed manuscript metadata into the TEI Header. These extended TEI files can be linked to brief EAD-based descriptions to create a unified browsing and searching environment."



Burnard, Lou; Sperberg-McQueen, C. M. "Encoding for Interchange: An Introduction to the TEI." Tutorial. [Draft version], November 21, 1994. 36 pages.

Abstract: The purpose of this document is to provide a brief introduction to the recommendations of the Text Encoding Initiative (TEI). It shows how these recommendations may be used to encode a wide variety of commonly encountered textual features, in such a way as to maximize the usability of electronic transcriptions and to facilitate their interchange among scholars using different computer systems. This tutorial discusses the basic principles of encoding texts, and describes most of the TEI "core" tag set and most of the elements defined in the TEI "base tag set for prose". It does not address other more specialized tag sets. However, the elements and attributes described here should be adequate for the encoding of a wide variety of different kinds of material to a reasonable degree of detail. Some basic knowledge of SGML is assumed.

Various versions of this document are or have been available. It has carried the filename TEIU5 in a number of incarnations, but apparently began as 'TEI ED W21'. Look on the OTA FTP server or environs for the most recent version, but if nothing obvious is there, try the local WWW server for a copy dated November 21, 1994.



[CR: 19950716]

Burnard, Lou; Sperberg-McQueen, C. Michael. TEI Lite: An Introduction to Text Encoding for Interchange TEI Tutorial, TEI document no. TEI U5. Oxford and Chicago: Text Encoding Initiative [ACH/ALLC/ACL], June, 1995. Extent: approximately 200K (HTML format). Authors' affiliation: Burnard and Sperberg-McQueen are the TEI editors..

"This document provides an introduction to the recommendations of the Text Encoding Initiative (TEI), by describing a manageable subset of the full TEI encoding scheme. The scheme documented here can be used to encode a wide variety of commonly encountered textual features, in such a way as to maximize the usability of electronic transcriptions and to facilitate their interchange among scholars using different computer systems. It is also fully compatible with the full TEI scheme, as defined by TEI document P3, Guidelines for Electronic Text Encoding and Interchange, published in Chicago and Oxford in May 1994."

"Copies of the current version [July 1995] of this text may be found via the World Wide Web at http://www-tei.uic.edu/orgs/tei/intros/teiu5.tei and http://info.ox.ac.uk/ota/tei/doc/teiu5.tei, and at other sites mirroring these. The document is also available in HTML form at http://www-tei.uic.edu/orgs/tei/intros/teiu5.html and http://info.ox.ac.uk/ota/tei/doc/teiu5.html. An HTML version of this document in a single file (for easier printing) may be found at http://www.uic.edu/orgs/tei/intros/teiu5.html. Copies of the formal SGML document type definition for the tag set described here may be found at the same locations, under the file name teilite.dtd: file://www-tei.uic.edu/orgs/tei/p3/dtd/teilite.dtd, ftp://ftp-tei.uic.edu/pub/tei/lite/teilite.dtd and ftp://info.ox.ac.uk/ota/tei/p3/dtd/teilite.dtd.



[CR: 19970110]

Burrows, Toby. "Using DynaWeb to Deliver Large Full-Text Databases in the Humanities." Computers & Texts [Oxford: CTI Centre for Textual Studies] 13 (December 1996) [??]. ISSN: 0963-1763. Author's affiliation: The University of Western Australia Library.

Abstract: "How to provide access to large text databases, such as those published by Chadwyck-Healey Ltd, for use throughout an institution is now particularly significant. This article describes one solution and details some of the associated issues and problems."

Available online: http://info.ox.ac.uk/ctitext/publish/comtxt/ct13/burrows.html; [mirror copy]. See the CTI Textual Studies Home Page at Oxford for subscription information and links to related CTI resources.



[CR: 19970714]

Busch, Joseph A. SGML for Cultural Heritage Information. CIMI Project Paper. Nova Scotia: Consortium for Interchange of Museum Information, September 30, 1995. Author's affiliation: CIMI (Consortium for Interchange of Museum Information / Getty Art History Information Program, Santa Monica, California, USA.

"Abstract: The Consortium for the Computer Interchange of Museum Information (known as CIMI) develops cultural heritage community standards to preserve digital museum information and facilitate its exchange. This paper discusses the CIMI Information Model, steps in the development of the CIMI document type definition (DTD) which uses Standard Generalized Markup Language (SGML) for content designation from a data model point of view, and issues related to linking SGML information objects. The paper was originally presented at the ASIS Mid-Year Meeting in Minneapolis, Minnesota on May 24, 1995."

Available online: Busch: SGML for Cultural Heritage Information [mirror copy]. See also the main entry for the Consortium for Interchange of Museum Information.

A version of the document is available now (apparently) in the volume Electronic Publishing: Applications and Implications, edited by Elisabeth Logan and Myke Gluck. ASIS Monograph Series. Medford, NJ: Information Today. ASIS, 1997. ISBN:1-57387-036-6 (hardcopy).



[CR: 19971227]

Buswell, Stephen. "Mathematical Markup Language. An XML Application for Mathematics on the Web." Pages 377-384 in SGML/XML '97 Conference Proceedings. SGML/XML '97. "SGML is Alive, Growing, Evolving!" The Washington Sheraton Hotel, Washington, D.C., USA. December 7 - 12, 1997. Sponsored by the Graphic Communications Association (GCA) and Co-sponsored by SGML Open. Conference Chairs: Tommie Usdin (Chair, Mulberry Technologies), Debbie Lapeyre (Co-Chair, Mulberry Technologies); Michael Sperberg-McQueen (Co-Chair, University of Illinois). Alexandria, VA: Graphic Communications Association (GCA), 1997. Extent: 691 pages, CDROM; print volume contains author and title indexes, keyword and acronym lists. Author's affiliation: [Stephen Buswell]: Director of Research and Development, Stilo Technology Ltd, Empire House, Mount Stuart Square, Cardiff UK CF1 6DN; Phone: (+44) (0) 1222 483 530; Email: sb@stilo.com.

Abstract: "Mathematical Markup Language (MathML), designed by the W3C HTML Math working group, is a XML application for describing mathematical expression structure and content. The goal of MathML is to enable mathematics to be served, received and processed on the Web.

"This paper discusses the particular problems posed by the representation of mathematics on the web and outlines the XML-based solution proposed. This solution supports both presentation and semantic models of mathematics. The paper looks at the relationship between MathML and some existing mathematical representations which have contributed significantly to its development.

"Initially MathML will be processed and rendered by helper applications. An overview of the browser interface and techniques for embedding of MathML in HTML pages will be presented. The requirements on tools for the creation, editing and viewing of MathML are reviewed. An outline of MathML support in applications, under development or planned, will be given."

This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.

For more information on the proposed Mathematical Markup Language, see the dedicated database entry Mathematical Markup Language (XML), and the July 10, 1997 draft version [WD-math-970704] from the W3C server. The "HTML Math Overview" and the "HTML Math Activity Report" supply other details about MathML and the activities of the HTML Math working group.

Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).



[CR: 19961226]

Buswell, Stephen; Pike, Roy; Pike, Martin. "The ISO 12083 Mathematics Fragment." Pages 345-372 in SGML '96 Conference Proceedings. Celebrating a Decade of SGML. SGML '96 Conference, Boston, MA, November 18-21, 1996. Sponsored by The Graphic Communications Association (GCA). [Edited by] Conference Co-Chairs: B. Tommie Usdin and Deborah A. Lapeyre. Alexandria, VA: GCA, 1996. Extent: 711 pages. Authors' affiliation: [Buswell]: Director R & D, Stilo Technology Ltd, Empire House, Mount Stuart Square, Cardiff, UK CF1 6DN. Email: sb@stilo.demon.co.uk, or sb@stilo.com; WWW: http://www.demon.co.uk/stilo or http://www.stilo.com; [R. Pike]: Clerk Maxwell Professor of Theoretical Physics, King's College, London; King's College and Stilo Technology Ltd, The Strand, London, UK WC2R 2LS. Email: erp@maxwell.ph.kcl.ac.uk; [M. Pike]: Director Marketing, Stilo Technology Ltd, Empire House, Mount Stuart Square, Cardiff, UK CF1 6DN. Email: mp@stilo.demon.co.uk, or mp@stilo.com. WWW: http://www.demon.co.uk/stilo or http://www.stilo.com.

Abstract: "Currently, most mathematics DTDs in widespread use are presentation-based, that is the markup relates to the layout of the mathematics on the page or screen rather than to the mathematical content. Such an approach makes the interchange between different SGML applications, and between SGML applications and computational applications, very difficult. This paper proposes a semantics-based DTD for mathematics, and describes a mechanism for selection of the particular branch of maths in use and extension of the DTD to cover areas of maths not as yet covered. Issues related to presentation, and the implications for applications, are discussed. Examples of possible mappings between the DTD and notations used by a typical computational program are given.

The meeting of the ISO 12083 committee in Munich in May 1996 accepted the proposal as the basis for the Mathematics fragment of the coming revision of the 12083 Standard. The paper reviews the issues raised and the resulting implications for the Mathematics fragment.

Significant progress has been made since the Munich meeting. The DTD has evolved following comments and test cases sent to the authors. Contacts with other interested organisations, such the OpenMath consortium and the W3 HTML mathematics group have been pursued."

Further information on SGML markup for maths may be found in the main SGML-Math entry of the SGML/XML Web Page; see also the entry for the EUROMATH Project.

Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/bib-ab.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org