Multimedia/Hypermedia Activity at SGML Europe '95

SGML Europe '95 was held in Gmunden Austria from May 16th to May 19th 1995. SGML Europe is an annual conference on the use of ISO's Standard Generalized Markup Language (SGML) within Europe, which is run by the Graphic Communication Association of America as part of a set of related conferences held in Europe, Asia and US. The conference is also the venue for the AGM of the 500 strong SGML Users' Group and for meetings of the SGML Open vendors consortium.

The multi-threaded conference covered many aspects of SGML through a combination of presentation sessions and workshops. This report summarises the presentations and workshops related to multimedia, hypermedia, databases and open document interchange.

The opening session saw four "keynote" speeches. Yuri Ribinsky of SoftQuad in Canada started with his now traditional overview of the areas where SGML has had the most impact over the last year. He was able to announce that SoftQuad was releasing a general purpose Internet SGML document browser that day in conjunction with NCSA, the developers of Mosaic. (A copy can be obtained from http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/WebSGML.sgml or http://www.oclc.org:5046/oclc/research/panorama/panorama.html.) This product provides the first publicly available method of browsing any document encoded in SGML that is transmitted as part of a MIME message over the World Wide Web (WWW)

Dr. Charles Goldfarb gave the technical keynote, explaining how far the ten-year review of the standard had progressed and also explaining that it had proved necessary to prepare a technical corrigendum to HyTime (ISO 10744) to allow the forthcoming Document Style Semantics and Specification Language (DSSSL) standard to reference features that are to be shared by the two standards.

Robert Cailliau from the World Wide Web Support group at CERN then reported on activity in the WWW arena over the last year. He was able to announce the formation on May 15th of the Web Society, a user group for WWW users. The first server for this group is currently being set up in Graz in Austria (http://www.websoc.at). Once this is set up national nodes of the society will be formed with the same address except that the country code will be that applicable to the user joining the group. The extensive set of conferences being organized by the International WWW Conference Consortium (IW3C2) will include a regional conference to be held in Portugal in July, with the next worldwide meeting being held at MIT in Boston from 14th to 17th December.

Mr Cailliau reported on progress at the 4th annual WWW conference held in Darmstadt in April. This 5 day meeting was atten2ded by some 300 developers of WWW servers. The main focus was on planned improvements to the HTTP and HTML specifications, and on how the network could be made more secure. Unfortunately it has still not been possible to complete the specification of Version 3 of the HyperText Markup Language (HTML3) as work still needs to be done on the handling of mathematics within WWW documents.

Mr. Cailliau pointed out that the majority of WWW users saw it as a form of entertainment rather than a method of information gathering. He noted that the book metaphor is not the best one for entertainment, especially when the data being transmitted mainly consists of pictures and sound. The WWW can be characterized as a computer-aided reading tool that allows related pieces of information to be accessed in many different ways. Information for use over the Internet needs to be presented in much smaller units than are normally used when interchanging paper documents. The generation of transmittable units needs to be kept as simple as possible so that it can be done by casual users.

The final keynote speaker was Koen Mulder from Gouda Quint in The Netherlands, who was one of the keynote speakers when SGML Europe was last held in Gmunden in 1989. Mr. Mulder compared his expectations of SGML then with the reality of SGML today.

After the opening plenary the conference split into the first of its parallel presentation sessions. One of these sessions covered current applications of HyTime. This session consisted of four presentations:

Elliot Kimber of Passage Systems Inc explained how HyTime links could be processed using existing SGML tools
Albert Bruffaerts of Sema Group (Belgium) explained how HyTime had been used as the encoding mechanism for the Multimedia Information Presentation System (MIPS) developed as part of ESPRIT project 6542
Hasse Haito of Synex Information AB, Sweden, explained how HyTime was used within the new SoftQuad Panorama WWW SGML-document browser developed by Synex
Catherine Hamon of High Text, France, explained why HyTime is useful for the management, interchange or publication of data stored in object-oriented knowledge bases and databases.

Elliot Kimber started by explaining the advantages that can be gained by storing information about the addressing and linking of data separately from the referenced text. He then showed how HyTime's mechanism for describing the relationships between pieces of data could be adopted today without having to use specialist software. He pointed to the Panorama browser as an example of a tool that could generate and utilize such links.

Albert Bruffaerts explained that there were two main aspects that are needed for the integration of multimedia and still images with text and database information: navigation structure and presentation structure. HyTime's hyperlinking facilities allow information about the relationships between information nodes to be interchanged between systems. HyTime also allows information about the presentation process (window models and scripts) to be described using SGML. The MIPS project shows how HyTime information sets can be generated dynamically in response to user input. User input is combined with information on the current processing state and information built into the program's knowledge base to produce a template that describes the relevant presentation. MIPS was characterised as an automatic report generator for multimedia databases.

Hasse Haitto pointed out that while the concepts used to describe hyperlinks have been known for some time HyTime is the first time an attempt has been made to provide a standardized way of describing them. The concept of SGML architecural forms introduced in the HyTime standard allows processes (methods) to be assigned to information nodes (e.g. SGML elements and entities). HyTime's concept of location ladders allows complex addressing techniques to be described as an ordered set of queries and links. Mr. Haitto also pointed out that pointers based on character position need to be dynamically maintained during editing, but are fixed during data delivery/browsing.

Catherine Hamon started by explaining the similarities between the SGML data model and that devloped by the Object Database Management Group for describing object-oriented data sets. As well as data objects, object-oriented databases store information on node set inter-relationships, traversal and views. HyTime allows both the data and the relationships between data nodes to be recorded in an interchangeable format. If a common set of relationships needs to be described topic maps of the type being developed by the Committee for the Application of HyTime (CApH) can be used to define relationships and identify where these relationships are being used. By also identifying the relationships between topics it is possible to build complex webs of relationships.

The HyTime session was closed by its chairman Charles Goldfarb explaining how HyTime is currently being extended so that ISO's Document Style Semantics and Specification Language (DSSSL) can utilise many of the features of HyTime to identify information nodes using its SGML Document Query Language (SDQL).

The session exploring the relative merits of using object-oriented and relational databases started with a presentation by Dr. Alan Brown of XSoft Inc. The parallels between the design of a documentation suite and the design of the product or system being documented were highlighted. The techniques for object-oriented system analysis are equally applicable for document analysis. If the design work is done in parallel it is possible to reduce the time between system completion and documentation completion, i.e. the delay needed to get the product ready for marketing.

Most object-oriented design systems store their data in object-oriented databases as this type of database is specifically designed to manage ordered hierarchies of the type that are not easy to map in relational databases. SGML documents also contain ordered hierarchies: they too can easily be mapped to the data referencing and storage techniques used in object-oriented databases.

John Chesholm of CSW Ltd followed up on Dr. Brown's talk by outlining the arguments made by the supporters of both the object-oriented and relational storage camps. He pointed out that object-oriented databases are now approaching the degree of sophistication of relational databases in terms of data access and system security. Compared with relational databases at a similar stage in their development cycle they are markedly further advanced. For example, the Object Database Management Group (ODMG) published a standard object model, with bindings for both C++ and Smalltalk, in 1993, and all members of this group are pledged to support this standard by the middle of 1995. In parallel with this has been the development of a standard Object Querying Language (OQL), which now provides a standardized methodology for querying between client terminals and object-oriented database servers.

Christhophe Espert of High Text and Philippe Futtersack of Electricité de France (EDF) then outlined why an object-oriented database was chosen for the EDF electronic library. The EDF R&D team uses an object-oriented design methodology (OOA) for its projects and much of its documentation is based on data stored in their object-oriented database. Rather than develop an object-oriented report generator they simply adopted the existing SGML standard as the simplest way to produce reports. To ensure that objects could be reused in different documents without restorage EDF teamed up with High Text to develop a HyTime-based mechanism for formalising the links between documentation objects.

Ted Carroll of Information Dimensions Inc. discussed the problems of storing SGML-encoded structured documents on a relational database. He pointed out that it is not possible to work in the standard "Third Normal Form" required by standard relational databases when you are storing documents with repeatable elements. He also pointed out that you cannot use standard SQL queries, or Object SQL (OSQL), to efficiently retrieve such recursively ordered information. For this reason Information Dimension's enhanced database has a special SGML server which provides an API for describing SGML-structure to the database, and a special query language for finding data in structured documents.

The workshop on the relationship between SGML, OpenDoc and OLE was started by Dr. Charles Goldfarb explaining how the formal system identifier option to be introduced in the extensions currently being made to the HyTime standard would make it possible to record the way in which the data had been encapsulated for storage in the originating system. He then showed the differences between the ways in which OLE and OpenDoc encapsulate information objects, and showed how both techniques could be formally described and encapsulated using the HyTime extensions in conjunction with the SGML Data Interchange Format (SDIF).

Benard Weichel from Robert Bosch GmBH then highlighted the problems that can occur in developing a system without the benefit of a standardized method for integrating the components of a compound document, and explained where adopting OLE techniques could simplify the development process.

Dallas Powell of Novell outlined the goals of OpenDoc. OpenDoc aims to provide interoperability protocols, scripts that are part dependent, enhanced collaboration through the adoption of draft storage mechanisms and by the adoption of a "parsimony" of drawing, co-ordinate and windowing subsystems. OpenDoc was characterised as being an Application Architecture for building specifications characterized by a particular program structure and user interface. SGML provides an overlying Document Architecture that identifies the type of processing to be applied to the different parts of a compound document set.

Michael Miller of Incontext Systems explained how OLE made it possible to quickly develop their HTML web browser subset of their SGML editing software to provide a low-cost tool for the display and navigation of linked and/or structured data.

In the final discussion session the need to resolve the the ways in which objects could be packed for interchange in a format that is not dependent on the architecture of the originating system were highlighted. HyTime's Sbento concept was postulated as a mechanism that would support OLE's option to interchange multiple representations of the same data (e.g. SGML structured text, flat file ASCII and Postscript printer files). A mechanism for identifying which subtrees of an SGML-encoded form identify a specific OpenDoc compound document "part" needs to be developed, ideally using a standardized methodology rather than a non-transportable, system-dependent, one. Standardized names for processes, of the type being developed by ISO/IEC JTC1/SC18 WG8 for international standards, will be needed if process portability is to be fully supported.

The session on the relationship between SGML and the World Wide Web's HyperText Markup Language (HTML) started with a lively presentation by Eric Severson of Interleaf/Avalanche on the case for an HTML-centric web. After pointing out that that HTML is a presentation format with a built-in search mechanism Mr. Severson contrasted the ability of SGML to describe data independently of presentation semantics with HTML's ability to identify elements that share the same presentation characteristics. An analogy was made with the continuum that was developed to allow black-and-white television receivers to be used in parallel with colour television, then with teletext and stereo sound and finally with surround sound. A similar continuum needs to be developed to allow users to start with simple HTML documents, then move on to more complex structures using SGML and finally onto complex managed information webs using HyTime.

Eric van Herwijnen of Nice Technologies pointed out that the fact that HTML can be considered as an SGML application is a complete coincidence. When the first version of HTML was developed it made no reference to SGML. The SGML connection only became apparent when an SGML DTD was developed to allow the syntax of HTML documents to be checked. (At present a large proportion of web documents do not conform to the rules specified in the HTML standard, mainly because they have not been automatically checked before being loaded onto a server.) Once HTML files have been put into a valid SGML form it becomes possible to build queryable databases of HTML files. It is not a good idea to convert SGML files into HTML before editing is completed. However there are aspects of HTML (for example, the forms used to provide data for the CGI common gateway interface) that could usefully be incorporated into DTDs that will be used to capture/generate information to be placed on WWW servers.

Steve Pepper of Falch Infortek, Norway, and Haise Haitto of Synex Information AB, Sweden, showed why SGML is indispensable for the interchange of structured information sets. HTML is ideal for home pages and for the simple information types currently found on the web. The minute you need to support more complex information structures or types you need to provide application-dependent interconnections of a type that are impossible to handle using the existing HTML markup. Two approaches to extending your information set are possible:

to allow SGML documents to be transmitted as part of an Internet datastream that can be interpreted by a special purpose SGML document browser (such as the SoftQuad Panorama browser developed by Synex)
to transform SGML documents into smaller, HTML-encoded, "chunks" that can be distributed via WWW servers.

The latter approach requires careful thought about how the structure inheriently provided in an SGML document can be represented in a flat HTML information set. In most cases it will be necessary to build new forms of cross-reference tables to describe these structures in an HTML form.

During the workshop on the relationship between SGML and HTML that followed the presentations it was concluded that:

HTML should be kept as simple as possible
HTML must add support for tables, and ideally for a limited set of presentation-based mathematics markup tags
HTML may need to add features that can be used by blind people, etc
HTML is not sufficient to provide the basis of a long-term data repository.

John Bosak of Novell promised to report the conclusions of the workshop to the IETF team developing Version 3 of the HTML DTD. He reported that it had already been decided that the CALS table DTD, which widely used within the SGML community, has been adopted as the basis for handling tables within HTML.

Martin Bryan
The SGML Centre
29 Oldbury Orchard
Churchdown
Glos. GL3 2PU
U.K.

Phone/Fax: +44 1452 714029
E-mail: [email protected]

File last updated: May 1995

© ECSC-EC-EAEC, Brussels-Luxembourg, 1995
Reproduction is authorized, except for commercial purposes, provided the source is acknowledged.
The EC cannot be held responsible for any subsequent use or misuse of the information contained in this List.

[email protected]