[Concatenated HTML files of the "Complete Conference Program" mirrored from the official GCA site (which alone should be regarded as authoritative): http://www.gca.org/conf/sgml96/]. Some links to online information have been added in this document.
November 18-21, 1996
Sheraton Boston Hotel and Towers
Boston, MA
Come and review the lessons and achievements of the last 10 years and prepare for the future success and growth of the SGML industry. SGML '96 will be an exciting mixture of presentations from technical theory for the experts to business cases for the managers. Even those new to SGML will find an entire track dedicated to their needs and interests. What as little as 5 years ago included only a few vendor demonstrations, is today an industry-wide exhibition where attendees can learn about the latest tools and products. Poster presentations will provide a forum for informal technical discussions and networking. SGML '96 will be an opportunity to celebrate the last 10 years and usher in the next...
Implementing SGML can be an enormous task. To be successful, an implementor must have a good technical background in SGML and a clear understanding of data flow and SGML system functionality. Gaining an understanding of the key components of an SGML system is critical. These presentations are designed to provide the SGML newcomer with an overview of the major classes of SGML tools and a brief review of the products commercially available today. Presenters for this session are independent SGML consultants who specialize in the design and implementation of SGML-based information systems.
The British National Corpus (BNC) is a large SGML document: 4124 samples from a rich variety of written and printed, famous and obscure, learned and ignorant, spoken and written contemporary British English texts. Each of its hundred million words and six and a quarter million sentences is tagged explicitly in SGML and carries an automatically-generated linguistic analysis. This presentation will focus on the technical aspects of creating, managing, encoding, and processing large linguistic corpora with SGML. It will assess the role which SGML played during the various stages of the BNC project: as an interchange medium between the various data-providers; as a target application-independent format; and as the vehicle for expression of metadata and linguistic interpretations encoded within the corpus.
See a longer abstract [mirror copy], and an online version of the presentation: Using SGML for Linguistic Analysis: the case of the BNC [mirror pis aller, but see the canonical source ]
FORMEX (FORMALIZED EXCHANGE) is one of the very first initiatives that adopted the SGML notation. Initially designed around the UNESCO CCF standard (COMMON COMMUNICATION FORMAT), the original FORMEX specification (1986) and its first revision FORMEX V2 supported both notations. This year, the Office for Official Publications of the European Communities EUR-OP released a new version of FORMEX V3 which incorporates more than ten years of experience in the SGML field. FORMEX V3 is based exclusively on the SGML notation and SDIF is the communication standard encapsulating the exchange of data. Though the FORMEX specification is able to support any kind of document, it has a specific target: Legal Publications. The set of tags exhibited in FORMEX V3 is highly semantic and can be combined into a wide variety of legal publications doctypes.
Free SGML software is important to the increasing number of publishers trying to demonstrate the viability of SGML on a small or nonexistent budget. The obviously attractive cost of such software is sometimes offset by limited documentation, debugging, support, and platform availability. This discussion of available free software will address aspects of developing and using SGML systems: text editing, DTD visualization, formatting and programming tools, and programming libraries for beginning and advanced programmers.
F0r links to Web pages where the free software may be downloaded, see: http://cs.nyu.edu/cs_alumni/duchar96/sgmlfree, or see the author's Home Page.
The features of several publicly available SGML transformation tools - probably CoST, STIL, SGMLS.pm, TclYasp, SENG (or the newer DSSSL engine if it is available in time), and SGMLC will be discussed. Their processing models-event driven for many of these tools or tree-based for CoST, STIL, and SENGs replacement-will be compared. A simple example will be presented for several tools to illustrate their coding requirements, and an indication will be made of how well the tools would scale up to handling large, multi-file, batch-oriented tasks.
Generation of SGML-coded documents as a result of database query processes is a commonly used practice. In most cases, however, the contents of such documents are entirely built from scratch as an SGML-formatted image of the query results. The purpose of this presentation is to present an extension to this practice, in cases when documents are made of a combination of human-generated parts and database-originated parts. When such documents are updated, human-generated parts should remain untouched, while database-originated parts (text, tables, and graphics) should be regenerated.
One of the most exciting applications of SGML in recent years is its use as a database format. The structural information embedded in an SGML document enables querying SGML documents to extract information automatically. However, this querying process is not standardized so SGML database implementations use their own query language syntax. An initial version of a DTD for the SQL query language, with extensions for use with hierarchically-structured documents is presented and compared with languages proposed and implemented, including the proposed Standard Document Query Language (SDQL) in DSSSL (ISO 10179).
The international standards committee responsible for the SGML standard, ISO/IEC JTC1/SC18/WG8, has been reviewing ISO 8879, which defines SGML. This talk summarizes the decisions that have been reached to date, giving detailed descriptions of the technical issues and describing the current status of the review and revision process and results of recent meetings of both WG8 and its U.S. counterpart, X3V1. In addition, speakers will identify topics expected to be addressed in forthcoming meetings and will provide a brief description of how interested conference attendees can participate in the revision process.
Currently, most mathematics DTDs in widespread use are presentation-based, that is the markup relates to the layout of the mathematics on the page or screen rather than to the mathematical content. Such an approach makes the interchange between different SGML applications, and between SGML applications and computational applications, very difficult. This paper proposes a semantics-based DTD for mathematics, and describes a mechanism for selection of the particular branch of maths in use and extension of the DTD to cover areas of maths not as yet covered. Functions or functional areas newly created by the author and as yet not included in the DTD are described.
As the SGML community continues to grow, users are seeking new support structures, new sources of information, new technology, and new ways of applying SGML. The result is that a number of SGML interest groups have emerged, not just around the U.S., but around the world. The journey to a strong, productive users group is long and not without hurdles. However, the benefits are many for everyone involved, and the learning experiences are invaluable. This presentation will target SGML users at any level who are interested in starting a users group or looking for ways to improve existing groups.
The annual SGML Conference provides the opportunity to focus on technology, expand our level of knowledge, and exchange ideas and experiences with others in similar or related environments. Once the week is concluded, however, we are challenged with sustaining the momentum thats been attained. This does not mean that one should only wait for the next years conference; much can be done in the interim to continue the pursuit of knowledge and exchange of information. Many communities have organized local forums, specifically designed to address these concerns. This talk will focus on some of the major issues in establishing and maintaining such an organization: organizational structure, attracting members, frequency of meetings, maintaining interest, programming, involving vendors and corporations, social events, and applying these concepts to internal and intra-company departments.
Success of the legacy conversion might be the single most important determinant of your organizations success in a move towards an SGML environment. It can also be the single most costly aspect of the project. This sessions goal will be to dispel the myths. An overview of the key issues will be illustrated with real-life experience. Keying vs. OCR vs. software conversion, what can software really accomplish, what can you expect in quality and how do you measure it, what a ballpark quote does and doesnt include, and how to improve the probability of success will be discussed.
This paper identifies key learnings grasped from project management of the SGML Implementation Plan of the Learning Support organization at Bellcore. Key outcomes were: Document analysis was critical to the success of the conversion project, the DTD writers interpretation of the data and its structure required an iterative process with document developers and users, DTDs will change, managements buy-in was needed at all points in the process, and not everyone will be on board the train at the same time.
Too many people tend to say tag when they mean element. While this might seem to be just semantic quibbling, the difference is actually important. The power of SGML-based processing lies precisely in the fact that an element is more than a tag. By examining three systems that exploit the power of SGML to allow sophisticated actions on content, this talk shows that understanding an element as more than just the tags that delimit it is a critical part of exploiting the full power of SGML.
Use of attributes to create complex tabular data is illustrated with a real-world example in which information in a 72 column tabular format spanning two printed pages and using three tabular legends was transformed to SGML tagged data. The SGML application used just seven content tags to support key identification data and over 70 attributes to carry the balance of the data. The attributes reduced the amount of data while strengthening the quality assurance by using fixed attribute values. Use of attributes made creation and maintenance of the data easier. The SGML application supported both paper output in tabular format and CD-ROM display in a more readable format.
Factors which play a significant role in the success of shared data and SGML database publishing will be discussed. Real life experience implementing an SGML database system that shares data has taught the speaker that it is important to pick products that work, maintain database quality, provide good training and support, and to have editorial buy-in.
An introduction to the main components of an SGML document, the SGML declaration, the prolog and the document instance, covering their configuration, interaction, and typical storage concerns. The management of external entities using SGML Open format catalog files leads to a discussion of pitfalls and system configuration issues. The audience should be left with a thorough understanding of the issues to consider when managing SGML data files. This talk is intended for the new DTD writer and others new to managing SGML components, including experienced end users.
In this showcase the presenters will: demonstrate successful products that can be used by people with disabilities to access publications, identify major problems in software design that deny access, point to resources that assist developers in creating accessible products in the future.
Use of multiple DTDs, issues and considerations that should be taken into account when designing DTDs for a given application, and deciding just how many DTDs are required will be discussed. A number of models (e.g. a single DTD for the entire process; one DTD for authoring, another for storage, another for output, etc.) are examined and the pros and cons of each discussed. Considerations include the costs for each model (cost of maintaining multiple DTDs as well as the transform filters placed between them, versus the inefficiency of authoring with a huge DTD), as well as the question of roll your own versus use of industry standard DTDs.
During development of a first generation online documentation conversion and delivery system, a majority of the obvious problems and requirements were addressed and solved. Other needs are not so obvious and were discovered only after the first generation system had been in place and in use for a certain period of time. These less-than obvious needs, which were discovered and subsequently implemented in a second generation system, are the topic of this presentation.
The Astrophysical Journal, published by the University of Chicago Press for the American Astronomical Society, is a large and complex scientific journal of over 25,000 pages per year. It has extensive mathematics, enormous tables, and tens of thousands of illustrations. We have successfully re-engineered the production system to be SGML based, including on-screen SGML copy editing, exporting SGML for conventional typesetting, and producing an online HTML edition from our SGML archive. One of the most difficult parts of the implementation was the use of SGML math (the AAP math DTD). I will describe in detail the problems encountered in mapping SGML math to and from LaTeX, TeX, and two different commercial typesetting systems.
See: University of Chicago Press, Journals Division.
Technical issues and production constraints hampered the full implementation of pure SGML production systems to serve the needs of the European Union for a decade. Pure SGML multiple-media production systems will be in place before the end of 96. Using these tools, SZs (Saarbrcker Zeitung) services include: traditional text capture (11 languages); text capture for automated translation (11 languages), production of specific vocabularies for automated translation; media independent data management and storage; synoptic pagination; production of the FORMEX V3 for EU on-line data bases and multiple media; and production of CD-ROMs.
Book publishing is a conservative industry that relies on a tried-and-true process, characterized by a strong division between editorial functions (obtaining and preparing manuscript) and production functions (turning manuscript into printed books), a division commonly known as the wall. SGML has been relegated to the production side in most implementations. While there is much to be gained here, this limited approach also involves a considerable sacrifice of potential benefit. This paper presents a blue-print for maximizing the benefits of SGML in a commercial book-publishing setting by showing how SGML can be leveraged on both sides of the wall, with consideration of practical implications for both process modification and the implementation of technology.
A new industry initiative for SGML in healthcare has been formed to create, maintain, and promote standards for clinical, administrative, and financial applications working in conjunction with the leading standard for medical information called HL7. This session will introduce the initiative, the coalition supporting the initiative, and its relationship to HL7; provide background on the special requirements of medical information systems and the need for structured information for clinic data; and describe the data modeling done by the HL7 group and how SGML fits into the HL7 picture.
See: SGML Initiative in Health Care (HL7 Health Level-7 and SGML), in the SGML Web Page database
Studies have shown that physicians spend upwards of 35% of their time on documentation, and the documents produced yearly number in the billions. Very little of this medical data is recorded in a format that is computer-readable. The result is a combination of high administrative costs and the inability of clinical decision-makers to use most of the data generated during the patient care process. Kurzweil Applied Intelligence has received a research grant from the National Institute of Standards and Technology (NIST) to build a prototype system which will use large-vocabulary voice-recognition technology to produce SGML-structured medical reports. The project raises some new and interesting problems, such as the dynamic creation of SGML documents from a voice-controlled application.
Most of the tools proposed to do SGML Transformation use the event-driven approach where each SGML event is associated to some action written in the application language. The tree manipulation approach consists in providing the programmer with a tree abstraction rather than an event abstraction for SGML instance manipulation. At first glance, it seems programmers must choose one or the other paradigm. These two approaches can and should be merged within a dual programming paradigm that takes benefit from both. Through concrete examples, we show that complex SGML transformations can be decomposed into pieces that can be handled through one or the other paradigm.
To construct an SGML-based document processing system, document structure and/or data representation must be transformed from a source document written in SGML to the applications format. In real-world problems the required translation often becomes very complex. To deal with this complexity, we designed a special purpose programming language (currently called AEsop) and implemented its processor. In this paper, we explain the design policy of AEsop, its major features, and provide case study of its application.
SGML has been an ISO standard for ten years now. It was being adopted and implemented even before the final standard was published, and its user community is now very large, with thousands of applications. But is SGML a standard for all times? As users and proponents of SGML, we need to take a hard look at our requirements and define what we need from the standard and its implementers. More significantly, we need to understand what information is and what we expect it to do for us. Only with that understanding can we devise good SGML applications, make the right requests from vendors, make the right links between SGML data and other kinds of information-or design a good replacement for SGML.
SGML is the most fully developed specification of the use of descriptive markup languages for electronic documents. Descriptive markup is simple and powerful idea, and has proved to be a requirement for many information processing applications. The adoption of SGML has proved surprisingly difficult, expensive and slow, given that the underlying ideas are simple and self-evidently good. The reasons most often given are the complexity of the standard and of the family of languages which it specifies. To increase the acceptance of SGML, and widen the application domain in which it is cost-effective, Minimal Generalized Markup Language (MGML) is proposed. MGML, and its formal specification, will be presented in detail along with working software and MGML versions of some popular industry DTDs.
Accessibility and reusability of information are important, and so are the relevance and applicability of information in a particular problem-solving context. Redundant knowledge creation and information rework waste organizational effort and dollars and have a profoundly negative effect on programs, processes, and systems. To combat redundancy and rework, organizations are seeking solutions in standard tools and standard data representations. This presentation will focus on the merits and problems with each approach and on the realities of corporate information processing, and how the argument about standard tools versus data can be solved to the best advantage of both groups.
Exploit your investment in SGML by using it as the source for your intranet delivery needs. Theres a big payoff in combining HTML, SGML, document component management and internet technologies to achieve a diversity of document products, increase quality of customer service, and ensure accuracy and timeliness. It is possible to automatically assemble pieces of information which exactly match a customers need, and deliver the most up-to-date information in the form and format requested.
Introductions to SGML must be tailored to the times and the audience. Successful approaches include: demonstrating real end-to-end applications with print and electronic style sheet creation; demonstrating the advantage of electronic books based on SGML over electronic books based on unstructured markup; reassuring writers that they will still write and designers that they will still design even when working with SGML-coded information; and telling the audience where they will save money (and where they wont).
When making a business case for SGML, one of the key arguments is justifying the cost for the transition to SGML. This presentation is designed to help you justify the cost of implementing SGML-whether your objective is to support multiple outputs or to re-engineer your information production processes. During this presentation we cover the measurable benefits in detail, discuss the unmeasureable benefits of SGML, and provide suggestions for preparing your argument.
Organizational decision-making patterns determine SGML investment strategies and potential benefits. A framework for identifying the primary policy objectives that can influence the selection of SGML (inherent policy effects) and application design (user-defined policy goals) will be presented. Competing and often contradictory goals and perceptions of value often make the development of a business case for SGML very difficult. Methods for integrating stake holder values, interests, and expectations in the early stages of application conceptualization and design will increase real and perceived benefits and de-fuse potential political problems before they develop.
The existing standard stylesheet languages that could serve as the single language for Web SGML, Cascading Style Sheets (CSS), and Document Style Semantics and Specification Language (DSSSL) are described. CSS is a nice fit for HTML but is inadequate to the needs of commercial content providers using generic SGML. The languages of DSSSL (the transformation language and the style language) support everything that CSS supports, and much more. DSSSL provides far more capabilities than are necessary to support initial Web SGML implementations. Something simpler is needed. DSSSL-O (formerly DSSSL-Lite) is a subset of DSSSL for SGML editors and World Wide Web browsers.
Further information on DSSSL Online may be found: (1) in the DSSSL entry of the SGML Web Page, or (2) on the SGML Open Web site ("The Case for DSSSL Online," by Jon Bosak).
HelpWise, an SGML-based Help authoring system developed and deployed within Novell, uses a combination of commercial and custom software tools to provide users with a structured, focused editing and information management environment for creation of Help screens within Novells GroupWare division. The functionality of the system from a user perspective will be described as well as relevant technical details. A demonstration of the system will cover project and topic management, features of the tool, and automatic output generation. Finally, lessons learned and reactions to the tool will be covered from both user and technical perspectives.
The Electronic Publishing Department at Northern Telecom, Inc. (Nortel) has transformed product and price publications from paper to electronic media, radically improving Nortels ability to control document quality and reduce information time-to-market. Significant production changes include; use of SGML, sourcing information directly from legacy product and price databases, serving multiple systems and platforms over an Internet, and CD-ROM document distribution. The Electronic Publishing Department has also developed a workflow application that automates on-line product and price submittals, review, and approval.
In addition to polling conference attendees over the last year, GCA has done a mail survey of our extensive database of people interested in SGML. Results will be discussed by conference in order to give regional perspective as well as for the information collection as a whole. Topics included: document formats, user skill levels, current uses of SGML, and investment in SGML technologies. The markets SGML has reached and what attracted these users will be discussed.
The US National Security Agency, the Central Intelligence Agency, the Defense Intelligence Agency, the National Reconnaissance Office and other agencies of the United States Intelligence Community are improving intelligence gathering and reporting through development and implementation of technology including SGML. INTELINK, the classified world-wide Intranet, addresses one of the worlds largest data management problems. Issues addressed will include networking; architecture and standards; analyst collaboration issues; and finally encryption and other security considerations. Examples of INTELINK SGML applications provide insights into issues, problems, and solutions for organizations using emerging technology to realize cost savings and improve capabilities.
Developing an SGML application involves making choices driven by end user requirements and by the availability and functionality of third party SGML parsers, authoring tools, search engines, browsers, and data converters as well as HTML and the World Wide Web if broad access is desired. SGML application developers typically build some or all of the following components: A DTD; legacy data conversion tools; a DTD-tailored authoring environment; a document repository; browsing and searching interfaces, and tools for producing formatted output. For each component we discuss design and implementation alternatives, the approach we decided to use, and our rationale for choosing that approach.
Thompson Legal Publishing has re-engineered aging SGML-based systems to meet current needs. Tools were chosen from solid companies that did not expose the SGML to users, did not restrict the use of SGML in any way, that have the capacity to emulate structure and that have APIs. Users now work in an environment that does not force them to place thirty elements/attributes in the data to enter one judicial case citation. Instead, a couple of clicks of the mouse, and in goes the case cite. Our savings in output processing have been enormous; a process that used to take cost $18.00/page now and costs $0.95 per page. The systems simplicity from the users point of view will be demonstrated, and the complexity of the data created and the resulting flexible output will be shown.
SGML-based tools available today were used to produce a complete Supplemental New Drug Submission for the Health Protection Branch, Health Canada. The submission was SGML browser-based, running on a Windows 3.1 PC. It allowed the reviewer to navigate and comment electronically on all the textual documentation, clinical data, and Case Record Form images required for the submission, and compiled all comments and relevant information collected during the review process for use in the reviewers report. Summary tables were linked to the underlying clinical data from the browser so that tables could be verified, the underlying database queries modified and analyses redone as the document was reviewed.
Technical and Management Services Corporation (TAMSCO), along with the Warner Robins/LU Chief Equipment Specialist, is currently reengineering the process for creating, distributing, accessing, and reusing technical information. Currently, all of the technical procedures for three types of C-130s are located in one technical flight manual. The Chief Equipment Specialist believes this manual would be more beneficial to the flight crew if the data was in three separate manuals according to type of aircraft. This data will loaded and authored, using SGML tools, directly into a database. From the database, technical procedures will be used to reconstruct the three separate manuals. Large portions of the information is used identically by all three manuals, however, each manual will have unique procedures.
The University of Oslo has used an SGML system to produce eight student hand books. The 80 authors included university advisors and secretaries; some had SGML experience, some did not, and some had no knowledge of computers. SGML was used as an infrastructure, applied to a wide range of documents with a given set of DTDs, and used for printing, searching, WWW publishing, and print on demand. This presentation will be a summary of the project report, including traps and successes, the users view of the project, and demands of the project team on management.
In a general DTD, element types that focus on common structural features of documents, but not on the semantic features that differ from one document class to another, are defined. If a DTD designer chooses to describe not only structural but also semantic features of documents, a different DTD is needed for each document class. To eliminate the problems in those two approaches we developed a system which, from one base general DTD, derives a different DTD for each document class. The base general DTD defines the structural roles of element types, and the system accepts any general DTD such as ISOs or OSFs. Derived DTDs reflect the semantic roles of element types as well, thereby enabling an SGML editor to show its users what element types to use and what contents to put in those elements types. Our system also translates SGML instances using derived DTDs to instances using a base general DTD.
Maintaining large amounts of SGML data in separate files on a file system has always been a difficult proposition. Trying to coordinate a distributed workgroup environment is even more difficult. Simple mechanisms such as ID and IDREF can become a nightmare on even small projects. A database environment offers many exciting possibilities for features such as version control, sharing, validation, and distribution. The challenge is to develop a system that is capable of accepting any SGML document and flexible enough to support many different SGML database applications.
The joint Air Transport Assn/Aerospace Industries Assn (ATA/AIA) Graphics Working Group has developed a specification for Intelligent Graphics (IGEXCHANGE) to support the interchange of graphical application structures containing non-graphical information. Development of industry requirements for intelligent graphics, Amendment 2 to the Computer Graphics Metafile (CGM) Standard developed to support application structuring of graphics, and the ATA industry profile of that standard will be described as will use of SGML syntax to describe attributes associated with application structures.
Standard Page Description Language (SPDL) was published in 1995 as a language to describe the final form of a document. The document processing model of ISO/IEC JTC1/SC18 WG8 described SPDL as the final stage of three steps; creation/edit (SGML), format (DSSSL), and presentation (SPDL). SPDLs architecture was influenced by Xerox Interpress and Adobe PostScript. Unlike PostScript, SPDL has a document structure using elements such as Picture and Pageset. SGML is used for clear text encoding of the document structure, ASN.1 for binary encoding. Except the positions of comments in the document structure, SPDL clear text encoding and binary encoding can map to each other easily.
A composition system accepts documents coded according to multiple authoring DTDs (of many versions) and provides a maintainable method for updating the system to keep pace with DTD changes. The key is that lower level elements, such as paragraphs and phrases, are identical across DTDs while Division (section) level elements differ. The solution automatically creates a document-type-specific transformation program and creates a generic SGML file from one of multiple authoring document type SGML instances. The generic SGML file can then be input to a more structure-based composition converter to create the final composed (targeted) output.
Authors of DTDs can learn from some features of the HTML DTD. Authors should consider the following questions; Do people really care about structure or do they tend to equate structure with formatting? Will they make the most of a complicated DTD, or only use some elements? How important is the authoring tool and the rest of the application to encouraging authors to make the most of a DTD? Do the answers to these questions change as users gain more experience?
The SASOUT table model was developed to support the tabular documentation needs of the Publications Division of SAS Institute. SASOUT instances contain enough meta information to allow them to be presented in both hard and soft copy. The meta data also permits non-traditional and interactive online presentations of the tabular data. The markup supports row headers and cell relationships in addition to all of the CALS features such as column heads, spanning rows and columns, and alignment of data. The model also supports behavior characteristics that allow the specification of online presentation methods. The SASOUT DTD is freely available.
After deciding to implement an SGML solution using Adept, the Institute decided to apply good software engineering and programming principles to the effort and develop a modular, maintainable store of declarative SGML structures and custom executables. The development implementation, philosophy and design behind the content-based tagging system, mechanics of the modularity at the component level, process used for DTD maintenance, and testing and development tracks will be described.
An overview of the information and steps necessary to convert and load DTDs that have been manually authored into a Near&Far Library. Because NFL is designed to protect the integrity of the DTD database, the definitions and descriptions of the elements, attributes and entities had to be consistent. Because RIAG had 43 DTDs we automated this conversion. Problems found during this process and RIAGs solutions will be presented with a summary of the costs and benefits will be presented as well as organizational impact of having a central repository for DTDs.
The SGML literature divides DTDs into two types: those that describe existing information structures and those that prescribe a fixed set of structures. A purely prescriptive approach has been in vogue for several years, however the descriptive approach has much to offer. It is suggested that many DTDs should in fact fall somewhere between the two extremes, and could be termed suggestive. In a suggestive DTD, certain structures are fixed, others are flexible, and still others are configured through the simple use of attributes to permit previously unexpected values. Relationships are explicitly marked where they cannot be derived.
SGML is the logical choice for encoding electronic documents, and Virginia Tech encourages (and will later require) students to submit Electronic Theses and Dissertations (ETDs) in SGML. Our DTD must work with translators and be usable for students preparing SGML directly. A usability test for tagging ETDs according to our DTD involves teaching SGML-novice college students to code using our DTD, observing them tagging their own documents, and interviewing them during the process. Preliminary results show that authors prefer a well-documented and indexed DTD user manual (replete with examples of correct usage), choosing a start tag instead of an attribute value, and few or no uses of parameter entities.
Information on the Electronic Theses and Dissertations project may be found at: http://etd.vt.edu/etd/.
Issues of SGML re-use can only be solved generally through the use of subdocuments. The SGML standard only defines two object types that can have independent existence: documents and subdocuments. Thus it is clear that only documents and subdocuments can be reliably re-used. In particular, external general text entities are not useful candidates for general re-use. My plea is for tools to add the functions necessary to support the use of subdocuments for the re-use of semantic fragments. The necessary functionality is described for browsers, parsers, and editors. The full promise of SGML cannot be realized until the problem of fragment re-use is solved and subdocuments are the key to that solution.
What do you do when you use a character set other than SGMLs default? How do you make your documents parse? How do you use your character set inside tags? When you feed it into the parser, what comes out? Using a simple example of an exotic character set with only six characters, this presentation shows how to use BASESET and CHARSET, and what that means to the parser and to the SGML processing system. The presentation concludes with examples of how the techniques used in the initial example are used in real life.
SGML is generally described with a number of character-set assumptions that are now considered rather European-Language-Centric. There are differences of opinion as to how the current SGML standard (ISO 8879 as amended in 1988) should be interpreted with respect to handling of the characters that make up the SGML documents it describes. It is not clear that these differences of opinion will be reconciled with respect to the current standard, but a consensus has been achieved as to how the revision now being worked on will treat characters and character strings, and how the character sets described in an SGML declaration will be interpreted and used. This talk will present the character model adopted by the group working on the revision of ISO 8879 (the Raporteur Group of ISO/IEC JTC1 SC 18 WG8).
ETMs vary from simple raster page turners to complete IETMs. For each type, an overview of major aspects will be presented. SGML-based ETMs and SGML-based IETMs will be compared and contrasted, highlighting fundamental differences in function, architecture, and applicability. An ETM display engine and sample ETM document will be used with an IETM display engine for the demonstration. As part of the presentation the information structuring capabilities of the MIL-PRF-87269 IETM DTD will be covered.
Boeing's First Generation Publishing Systems are paper-oriented and mainframe-based. Boeing engineers have authored manuals for many years on systems that do no structure or content enforcement. Boeing's Second Generation Publishing Systems are digital data-oriented and Unix-based. This system, now in it's ninth year of development, is extremely successful for producing both paper and digital 777 and 737-X manuals, but performance and storage restrictions keep it from dominating our Publishing Environments. Recently, Boeing has focused on centralizing applications and returning to the single source of data concept, thus adding to the need of a Third Generation Publishing System. This system, Gemstone, is now in design.
Ericsson Radio Systems AB is delivering a Mobile Telephone System, the CMS 30 System, to Japan. This presentation describes how the 40 volumes of Operation and Maintenance documentation were created using SGML. The Operation and Maintenance Manual is produced in English and then translated to Japanese. SGML is used today for the English Manual and Ericsson is developing the capability of creating the Japanese Manual in SGML.
Wartsila Diesel Power Plant provides an editorial system for their subcontractors, so that they can easily produce content-oriented information modules, based on the physical equipment breakdown structure (EBS) according to the WD Base-DTD. The production system used in Wartsila maintains and produces presentation-oriented technical manuals from the content-oriented information modules delivered by the subcontractors. Background and problems of handling information coming from several sources in different formats, why WD decided to implement a CALS/SGML information environment, and what they achieved so far will be described.
SGML is used for describing highly structured engineering data based on a method, called macro modeling, which has been developed for modeling complex systems in technical domains. The structuring principles given by the method ensure that these models are without redundancy, highly modular and can be refined and assembled without overlap. Semantic, content-oriented DTDs are used to map models to document structure. DTDs for the most important domain models have been implemented as has a prototype tool for combining different model-based DTDs to a domain and application specific content-oriented DTD.
RAFHS produce the Aircrew Manuals and Flight Reference Cards used by all three United Kingdom services; Army, Navy and Air Force. RAFHS team members are aircraft specialists not computer professionals, therefore the system acquired had to be intuitive, modern, and have an excellent user interface. The RAFHS system is an integrated solution providing SGML author/editing, document management, revision tracking to provide future proofed data, an airworthiness audit trail, and output formatting and pagination.
Use of SGML Marked Sections for effectivity markings provides a way to show all effectivities to a user simultaneously; permits all effectivities to be in the same information fragment; and leaves the task of resolving which effectivities to use to the SGML parser. However, Marked Sections can be confusing and do not permit nesting of effectivities or selection of data based upon multiple criteria. The element/attribute approach has proven to be more robust because it supports more permutations of possible differences. The down sides are that the representations force more lenient content models and that data transformation utilities are needed to resolve the effectivity differences for publishing.
An act describes regulations, decisions and directives about human behavior in terms of articles. The European Community has produced acts for over 40 years, many of which are modified multiple times in other acts. Consolidation of the law is the process of obtaining the correct state of an act at a given time. Consleg (an abbreviation for "Consolidation of Legislation") is an SGML application that helps operators consolidate acts. A revision mechanism based on attributes is used to embed the modifications formulated in the modifying acts into the act to be consolidated. An SGML processing engine generates consolidated acts.
SGML alone has not scaled perfectly in large-scale environments. This is due to typesetting legacies, and the absence of MIS methodologies in SGML application design. Microdocument architectures extend relational database methodologies into document systems without subverting SGML's strengths. Instead of unwieldy "bookish" DTDs, this talk introduces the concept of "microdocuments". Small, independent DTDs are used to describe narrative-based information units; these microdocuments are held together in a relational or object-relational framework.
The steps involved in developing a microdocument architecture for a complex data set, and the design of output products will be described. Issues covered include: changes to traditional SGML analysis approaches; identifying underlying information units, avoiding the "book" structure; defining the boundary between meta-data (to be captured in relational fields) and narrative (to be described with micro-documents); removing redundancies using database methodologies; establishing connections between micro-documents and relational information; inferring navigation pathways, links, and delivery packages for output products; and indexing considerations for rapid retrieval.
Many SGML users have documents which need to be shared at different levels of granularity, created or distributed on the fly from smaller pieces, versioned, found via queries, or managed in parts. This suggests that documents be composed of smaller units, components, which can be exchanged among users, combined to form documents, versioned, returned as the result of queries, and validated. Basic properties and uses of components are defined; design choices for implementing components in an SGML repository are discussed; and key design choices made in an SGML repository system jointly designed by F.A. Davis, a medical textbook and multi-media publisher, and POET Software, an object database company are described.
The position that the current DSSSL syntax will inevitably lead to an overly complex and difficult programming style is presented and an alternative syntax as a tree transformation language is proposed. The STTP component of DSSSL is proposed as a tree transformation language, but DSSSL's syntax is flat, rather than tree-shaped. An alternative syntax would make it possible to nest transformation expressions in accordance with the content models in the DTD. A number of tree transformers, mostly for SGML, have been built using this paradigm. Experience shows that most query and priority expressions are unnecessary with this approach, and transformation expressions are simplified because of the possibility of sharing private information among expressions transforming a shared subtree.
An online version of the presentation is available: "Why Isn't DSSSL a Tree?", in SGML format; [mirror copy]
Transformations allow developers and users to think of documents as active parts of a system. Documentation systems and other document-related systems can be re-oriented to use transformations as the means by which documents are processed or produced. Using the new DSSSL standard, systems that read both standard documents and standard transformations can be created. Simple tasks like editing can be re-oriented as a transformation process. Thus, transformation takes "center stage" as the "conductor" of the processes necessary to produce your documents. The infrastructure necessary to produce such systems uses SGML, HyTime, and DSSSL.
JSP (Jackson Structured Programming) is a software development methodology from the early Eighties. This talk explores the relationship between JSP and SGML and considers whether SGML's superiority to JSP as a data modeling language might make SGML useful as a general purpose Software Development tool. It also examines how some of the ideas of JSP an be usefully applied to more traditional SGML processing applications. Comparing SGML with JSP - a philosophically similar, general purpose, software engineering methodology - may give us some clues as to where SGML is headed. It may also point to the sort of SGML CASE tools we are likely to see in the future.
SGML uses DTDs to formally describe document syntax and structure that is purely declarative and independent of the future document's processing. Sooner or later a document has to be processed, with means we need to associate semantics with the document's structure. In a compiler context, semantics are separated into static and dynamic. In a document processing parallel, one can think of a document's decorated tree (as recognized by an SGML analyzer) as the static semantics and the document's tree transformation as dynamic semantics. Based on this idea, the relationship between SGML, DAST (Decorated Abstract Syntax Tree), and Algebraic Specification tools is related to processing documents in general and to generic document processing tools.
Some application architectures coupled with an SGML parser offer an object mechanism with embedded SGML. The relation between the parsed tokens and the application methods shows that application objects are connected to parsing objects in a simple and efficient paradigm which is fully conformant to the LINK feature of the SGML language. This paradigm is applicable to the HYTIME applications and the standard applicative features contained in HYTIME. Adopting this view of an SGML application, suddenly makes all the facilities offered by the LINK features self-evident and useful.