SGML: JEDI Project, Style Sheet analysis report
UKERNA
JEDI Project
Deliverable 3
Style Sheet analysis report in the context of the
DMU-JEDI project
Version: 1.0
Date: March 1996
Authors: A Gartland, D Houghton
UKERNA
Atlas Centre
Chilton, Didcot, Oxfordshire OX11 0QS
This document, or parts of it as appropriate may be freely copied and
incorporated unaltered into another document subject to the source being
appropriately acknowledged and the copyright preserved.
Trademarks:
`JANET','SuperJANET' and 'UKERNA' are registered trademarks of the Higher
Education Funding Councils for England, Scotland and Wales.
Disclaimer:
The JNT Association cannot accept any liability for loss or damage resulting
from the use of the material contained herin. The information is believed to
be correct but no liability can be accepted for any inaccuracies.
Availability:
Further copied of this document may be obtained from the JANET Liaison Desk,
UKERNA, Atlas Centre, Chilton, Didcot, Oxfordshire, OX11 0QS
Copyright (c) The JNT Association, 1995
Contents
* The Joint Electronic Document Interchange Project (JEDI)
o Introduction
o Background
o The project and its aims
o Final Results
* JEDI Project Status
o Work to Date
o Future Strategy
+ DMU Activities
+ UCL Activities
* Introduction
* The Document Type Definitions
o General specification
o The TEI DTD
o The TEI Lite DTD
o The Rainbow DTD
o HTML 3.0
o SEMA Group's WRITE-IT
* Style Sheets
o Panorama
+ Attaching Style Sheets
o Cascading Style Sheets
+ Why style sheets for HTML?
+ Methology
+ CCS support
o DSSSL Lite
o Others
* Panorama the product
o Panorama PRO
o The Free Viewer
* Arena the product
* Conclusions
* About this document ...
Executive Summary
This report provides an introduction into the use of style sheets for use
with SGML documents. The report concentrates on the practical application of
style sheets that are in common use and studies in detail the style sheet
mechanisms that are used by Panorama and Arena HTML 3.0.
The report defines the SGML DTDs to be used in the study. These are :-
* The Text Encoding Initiative
* Electronic Book Technologies (Rainbow)
* Internet Engineering Task Force (HTML 3)
The products to be employed to provide SGML viewing have been identified as
:-
* Panorama from SoftQuad
* Arena from WWW Organisation
The report introduces these products and highlights the key features of
style sheets that are used with these packages.
The report also mentions the Document Style Semantics and Specification
Language (DSSSL) standard as being the desired target for style sheet
standards.
The Joint Electronic Document Interchange Project (JEDI)
Introduction
There is increasing concern in the research community of the need to
standardise on formats for electronic document interchange. There are a
growing number of word processing formats that complicate an already complex
array of proprietary and de facto multi-media standards. The JEDI project
proposes to identify and investigate the leading word processing formats and
conversion tools with an aim to providing a solution to multi-format and
multi-platform document interchangeability. The aim is not only to study
possible packages, but also to demonstrate such interchangeability both in
electronic mail and database access environments.
We are providing a common introductory section to each deliverable. The
introductory section will summarise the background and aims of the project,
its progress to date, and plans for future work. The last two will be
updated with each deliverable. The bibliography and glossary will be
cumulative; hence they may refer to refereces not in that deliverable and
will be out of order.
Background
As modern publishing, editing and authoring has become influenced by the
advances in information technology there is a growing realisation that there
is a need for standardisation for electronic document interchange. The word
processing tools have become many and varied. The manufacturers of such
products apply whatever principles they see fit in order to make their
product the market leader and henceforth the dominant standard. As a result,
it is very difficult to get real industry impetus behind such common
standards. There have been many attempts to introduce such standards in
electronic text processing and information interchange; three examples are
the Office Document Architecture (ODA; ISO 8613) activities [1], those
towards a Standard Generalised Mark- up Language (SGML) [2], and the
HyperText Mark-up Language (HTML) [3], used in the World Wide Web (WWW) [3].
The dominant providers of word processors (e.g. MicroSoft WORD [4] and
Novell WordPerfect [5]), have often provided tools to allow the import of
others' standards, but have done little to promote easy interoperability of
their documents with those of others. Other formats in common use are LaTeX
[6] and Rich Text Format (RTF) [7]. There are yet others, but they are not
being considered in this project. One aspect of compound document
interchange is the format of the document; another is the mechanism for
interchange and its relationship to the format. In the context of document
interchange there are two particularly popular methods of interchange -
electronic mail and database access. For the first, we will consider the
problems raised by integrating the formats above with the Multipurpose
Internet Mail Extension (MIME) [8]. Electronic document delivery from a
database involves an author generating documents in a proprietary format and
providing the electronic form to organisations running a database service.
These will then provide the documents to the users by storing them in a
database in the original format or one in popular used format by the
community. The customer for these documents will require them in one of two
ways. Either they will be request the in their original format; then they
will need the tools of the originator of the documents for reading and
browsing. Alternatively, they will wish to search against a collection of
documents, using some kind of query mechanism, and then browse or read the
articles that were found. It should be possible also that a reader browse
the document in his favourite format using the tools available in-house. An
on-line database of Electronic Documents offers many advantages over the
conventional paper-based Documents; many of these advantages fall into the
areas of search and access. Electronic searching texts for information is
much easier than manual. The Word Wide Web (WWW) [3] is the largest
information service on the Internet. WWW is a client-server system with both
clients and servers throughout the world. A WWW server is a program running
on a computer that listens on a TCP port for incoming connections from WWW
clients. It expects a connecting clients to speak in a protocol called
Hypertext Transfer Protocol (HTTP) [3]. The documents in the WWW are written
in HTML (Hypertext Mark-up Language) format, and delivered from WWW servers
to the Clients in this form. The technology in the WWW is evolving
incredibly rapidly; many more converters proprietary format to HTML are
available already, and many more are coming. Hence the importance of WWW
access to documents.. It is also essential to be able to search a database,
and then retrieve documents found. Here there is an International Search and
Retrieval Standard (Z39.50) [9], and a particular widely used
implementation, the Wide Area Information Servers (WAIS) [10]. These will be
used in the project.
The project and its aims
The Joint Electronic Document Interchange (JEDI) project emanates from the
call for proposals by UKERNA in September 1994 for Electronic Document
Interchange projects. JEDI is studying the popular formats for word
processing that exist in both academic and commercial environments. The
project aims to identify format conversion methods for popular de facto
standards and their relationship with internationally recognised standards
such SGML and ODA. The work on SGML converters is being performed at De
Montfort University (DMU), while the work on ODA, WWW, electronic mail, and
database access is being performed at University College London (UCL). This
explains the Joint part in the project title's acronym. The general aims of
the project are as follows :-
* Analyse the current international and industry standards used for
electronic document creation;
* Analyse the current tools and methods available for text processing and
electronic document interchange;
* Study the use of SGML, ODA and HTML as means for Electronic Document
interchange
* Evaluate the interoperability of some of the implementations available
of such converters - including HTML converters from SGML, ODA and RTF,
and the LaTeX -> SGML converters;
* Design and implement a multi-mode converter, which will convert
documents in recognised formats to SGML
* Evaluate the UCL SGML-ODA converters
* Investigate the transfer of such documents by the MIME e-mail
technique.
* Investigate the storage and retrieval of compound documents in a
searchable form by the use of WAIS.
* Set up of a User-friendly interface for editing documents from any
format to HTML form and storing it on a database in that format.
The DMU work consists of identifying known format conversion techniques and
establishing methods by which popular word processing formats can be
converted to SGML and related style sheet formats for presentation. It was
hoped to utilise the Document Style Semantics Specification Language (DSSSL)
Lite standard [11], but delays in its release and its limited support by
computer software manufacturers have prevented this. In order to demonstrate
the concepts of word-processing conversion to SGML the project has chosen
the style sheet language of Panorama from SoftQuad [12]. The UCL work
consists of three basic activities:
* Investigating the current status and suitability of ODA, HTML, and WWW
implementations - and converters between them;
* Investigating the suitability of MIME for interchange, and WAIS for
storage and retrieval, of such multi-mode documents;
* Developing a testbed system which will allow documents prepared in one
format to transferred, stored and retrieved in another.
This system uses the mail system MH [13] with MIME support. UCL will
implement a system which acts as an automated testing and storage tool for
different format documents. The system processes automatically SMTP and MIME
messages with specific headers, and acts on information in the header. It
can STORE, RETRIEVE and RETURN original or converted document. Any documents
which is added to the Automatic Mail System (AMS) database will be also
indexed and stored in WWW for further search and retrieval by a WWW Clients
using tools from the WAIS software. For the automatic conversion UCL will
modify UCL SGML>ODA converters to work with the DTD produced by DMU, UCL
will install different converters on the users' request.
Final Results
By the end of the project (Mid 1996), the project partners are expected to
demonstrate the following:
* Tools for creating SGML, ODA and HTML documents;
* Interoperability of documents prepared with either SGML and ODA
tools,and re-process by the other;
* Conversion of documents in standard formats to HTML ones;
* Provision of a complete package for WWW services for a typical
conference proceedings;
* Allowance of search of several databases by a search engine with
automated document retrieval;
* Demonstration of the tools for creating and converting ODA, HTML and
SGML documents.
JEDI Project Status
Work to Date
There have been five Deliverables of the project up to now. In [14] DMU
considered the numerous document converters that are available within the
public domain. The study concentrated on the conversion of the following
popular formats :- LaTeX, RTF - Rich Text Format, SGML/HTML, MicroSoft Word,
Novell WordPerfect, PDF - Portable Document Format [15], ODA - Open Document
Architecture. In [16] there is a brief description of the packages available
along with information on obtaining and setting up each package. The report
also describes a set of important parsers, document viewers and related
tools for document delivery, as well as the WWW document servers. In its
first Deliverable [17], UCL reported on the interoperability of two ODA
converters and discusses the advantages and drawback of these converters.
The report also investigates the problems in converting from the SGML to the
ODA format. It also looked at different Public Domain versions of converters
which are available for converting SGML, ODA, and RTF to HTML. Finally, the
report discusses the commercial package of WAIS and WAISgate for creating a
searchable document database which can be retrieved using WWW clients. In
its second Deliverable [18], UCL evaluated the interoperability of RTF and
ODA. There exists an ODA tool for viewing ODA documents on the WWW [3].
Storing documents and retrieving the search result would be an advantage to
the project. UCL will test the public and commercial versions of such tools.
UCL have studied both storage and retrieval system and several WWW gateways
for retrieving documents from the database; one of these, WAISGATE [10] has
been set up. For document transfer, UCL have set up a Mail system with MIME
body parts of different formats viz. SGML, ODA, HTML, RTF and LaTeX.
Document retrieval via mail messages is also useful; in [18], we specified
also an Automatic Mail System for the originator of the document to store
the proprietary format in the database or to request a conversion procedure.
DMU-JEDI D2 [27] looked at document format analysis for the popular word
processing packages and identified the SGML DTDs that are to be used in
further project work.
Future Strategy
All parties realise that the formats, tools and developments in EDI are
changing rapidly due to the technologies used within the WWW, the Internet
and commerce. The activities in the project are concentrating on tools
usable in these environments. Any reference to the past work is quickly
overtaken by events. For this reason, the most recent information on the
project is kept in the WWW itself in [19].
DMU Activities
SGML [2] is a general mark-up language; its layout semantics are defined in
SGML Document Type Definitions (DTDs), and its presentation properties in
Style Sheets, With these general ideas in mind, DMU will continue the
investigation into existing document formats in order to try and provide a
method for generating SGML DTDs for a set of common document types. The
final aim it to have a set of DTDs and Style Sheets for all occasions. The
project will identify methods for users to reach this goal. To this aim, the
work done by groups such as the Text Encoding Initiative and Electronic Book
Technologies [20] will be of immense value. It is also encouraging to note
that Novell WordPerfect have recently brought out WordPerfect 6.1, SGML
Edition. First indications are that this product is very good. MicroSoft
have not yet brought out an SGML version for Word although one has been
promised. Such converters do exist from other suppliers, but they are
normally imperfectly integrated into the editors. DMU will concentrate on
studying how SGML documents can be generated easily from within popular text
processing environments. The project will study the following DTDs :- HTML
3.0 [21], Rainbow 2.5 [22], TEI Lite [23]. Using these DTDs, suitable style
sheets will be developed for use with SoftQuad's Panorama PRO [12]. A style
sheet converter will also be developed to allow migration to the DSSSL
standard [11]. DMU will investigate the suitability of the above DTDs for
electronic document interchange.
UCL Activities
UCL will install and test this ODA viewer tool. UCL will also evaluate the
interoperability of other converters available in the market. As the result
of the DMU investigation on creating SGML documents, UCL will modify the UCL
SGML-ODA documents to work with different SGML DTDs, especially with
generated with the DMU DTD UCL will finalise the work with implementing an
Automatic Mail System (AMS) which acts as an automated testing and storage
tool for different format documents. The system processes automatically SMTP
and MIME messages with specific headers, and acts on information in the
header. It can STORE, RETRIEVE and RETURN original or converted document .
Any documents which are added to the AMS database will be also indexed and
stored in the WWW server for further search and retrieval by WWW Clients.
Many converters will be used in the AMS system, e.g UCL modified SGML->ODA,
RTF->HTML. LaTeX->HTML, and others as they become available. UCL will work
on extended AMS specification to have a complete automated system for
storage, conversion, retrieval and indexed databases.
In [15] DMU identified what documents formats are being used by the
information providers and what documents tools are currently available for
the conversion from one format to another format e.g SGML-ODA, RTF-HTML,
LaTeX-HTML, RTF->LaTeX and Text-> HTML. Documents which are stored in the
database will be automatically converted into HTML and stored in the WWW
server for retrieval by WWW Client. Using the Automatic mail system
documents attached to the originators request will be added to the index
system to create a database.
Introduction
This document discusses the use of style sheet languages with reference to
the JEDI project. The style sheet system to be used within the scope of this
project will provide the mechanism by which documents written using Standard
Generalised Markup Language, SGML (ISO 8879) [2] may be presented. The JEDI
project initially intended to utilise the ISO 10179 standard Document Style
Semantics and Specification Language (DSSSL) [11] but support for this
standard has been slow to materialise. Instead, the project will concentrate
on the Panorama style sheet language which is a popular, widely available
alternative.
The Document Type Definitions
General specification
The DTDs to be used in this study were discussed in DMU JEDI Deliverable D2
[27] and included :-
* The Text Encoding Initiative
* Electronic Book Technologies (Rainbow)
* Internet Engineering Task Force (HTML 3)
* SEMA Group's WRITE-IT
The TEI DTD
The DTD from the Text Encoding Initiative is available from
http://info.ox.ac.uk:80/~archive/teij31/index.htm#toc
The TEI DTD is a comprehensive guideline for document markup that is
sponsored and funded by numerous bodies that include the Directorate General
XIII of the Commission of the European Communities. Further information may
be obtained from
http://www.uic.edu/orgs/tei/#Description
The TEI Lite DTD
Another DTD from the Text Encoding Initiative is available from
http://info.ox.ac.uk:80/~archive/teilite
The TEI Lite DTD is an abbreviated version of the TEI DTD. Further
information may be obtained from
http://info.ox.ac.uk:80/~archive/teilite/teiu5.html#ID1
The Rainbow DTD
The Rainbow DTD has been written by Electronic Book Technologies and is
available from :-
ftp://ftp.ebt.com
The Rainbow DTD is an attempt to solve the problem of conversion from
popular word processing package formats, such as RTF, to SGML. EBT provide a
software package known as Rainbow Maker that is designed to perform this
conversion and is available from the above address.
HTML 3.0
The HTML 3.0 DTD may be obtained from
http://www.w3.org/hypertext/WWW/MarkUp/html3/CoverPage.html
The HTML 3.0 DTD is an attempt by the Internet Engineering Task Force to
provide a standard HTML for use on the World Wide Web.
SEMA Group's WRITE-IT
The WRITE-IT DTD was used in DMU JEDI Deliverable D2 to produce a simple
example of how to markup a letter. It is available from numerous sites. The
European mirror site is
ftp://rs104.hrz.th-darmstadt.de/pub/text/sgml/DTDwrite-it.dtds.tar.Z
Style Sheets
The concept of style sheets is a simple one that is often misunderstood by
users. This misconception is due to the term "Style Sheet" being used by
different manufacturers to mean different things.
The JEDI project uses the term Style Sheet to mean that defined in (ISO
10179.2) DSSSL, i.e. a set of attributes associated with a DTD that defines
presentation markup for elements and entities of that DTD. That is to say,
potentially, for each element and entity in the DTD there is a set of
information that defines such aspects as font, colour, emphasis, size,
position etc. The presentation markup is clearly related to the ability of
the viewer/browser to display the requested "style". If the user requests a
font family that the viewer does not support, then it is bad luck. However,
all is not lost. The style sheet information associated with the DTD is
independent of the browser and so the information is retained as the
publisher and author would wish.
In this section we shall discuss style sheet languages and methods and
provide examples of their use with particular emphasis on usefulness for
Electronic Document Interchange.
Panorama
Panorama style sheets provide publishers with control over the presentation
of SGML documents. Style sheets provide control over display attributes such
as font, size, weight, colour, indents, spacing and automatic numbering.
Text can be made larger so that users can see the text from the same
terminal, or add colour to make important text stand out.
* Add colour to text and graphics to highlight important information.
* Adjust font type according to the font availability on the client
machine.
* Change margins and adjust the text size to suit the needs of different
users.
* Add more white space to documents to increase readability and improve
the overall look of the document.
Attaching Style Sheets
The mechanism used to attach style sheets to a DTD involves several steps.
Firstly, a style sheet file is created using any editor. The file extension
is SSH and it must contain information relating to the required DTD. For
example :-
<!DOCTYPE STYLESHEET
PUBLIC "-//Synex Information AB//DTD Stylesheet Explorer//EN">
<STYLESHEET DTD="MY-OWN.DTD" NAME="Fulltext">
The text that refers to MY-OWN.DTD is the important bit.
Secondly, the file must contain information about those DTD Elements that
are to be 'presented'. For example
<STYLE TAG="TITLE">
<JUSTIFY V=CENTER>
<FONT-SIZE V=14>
<FONT-WEIGHT V=BOLD>
<FONT-COLOR V=Navy>
<BREAK-BEFORE>
<BREAK-AFTER>
<Z-RULER V=21>
</STYLE>
The tag in this example is TITLE and it will be centre justified with a font
size of 14pt that will be bold and Navy blue in colour. The break
information means that a new line is created before and after the text and
the Z-Ruler indicates that a horizontal line will follow the text.
The next step in the set up is to place the style sheet file in a place that
Panorama can get it. The style sheet and DTD may reside locally or on a
remote server. If locally, The DTD must be placed in the CATALOG directory
and the style sheet in the ENTITYRC.
If remotely on the server, the style sheet and DTD must be in the directory
with the data.
For a more detailed example of Panorama style sheets see Appendix C
Cascading Style Sheets
Cascading Style Sheet (CSS) terminology is based around that of the desk top
publishing industry. Work in producing Cascading Style Sheets drafts is
currently being carried out by Håkon W Lei and Bert Bos of the W3
consortium. Cascading style sheets are the favoured style sheet coding
mechanism to be used with future WWW browsers.
Style sheets could be included into HTML documents in the following ways.
<HTML>
<HEAD>
<TITLE>title</TITLE$>
<LINK REL=STYLESHEET TYPE="text/css" HREF="http://style.com/cool">
<STYLE TYPE="text/css">
@import "http://style.com/basic"
H1 { color: red }
</STYLE>
</HEAD>
<BODY>
<H1>Headline is red</H1>
<P STYLE="color: blue">While the paragraph is blue.
</BODY>
</HTML>
* The <LINK> tag in the documents `head' could be used to
link the HTML page to an external style sheet. <LINK
REL=STYLESHEET TYPE="text/css" HREF="http://style.com/cool">
* The <STYLE> tag inside the HTML documents `head' which
would store style sheet information in the document itself.
* An imported style sheet using the CSS style sheet notation, included
between <STYLE> tags.
* A STYLE attrubutes on each of the tags itself. <P STYLE="color:
blue">
Under normal circumstances, only one of the above methods of including style
sheets into an HTML documents would be used, with the exeption of
<STYLE> where this can be used to modify the appearance of a
style introduced by another style sheet via a different metheod.
Why style sheets for HTML?
Netscapism
HTML is a content oriented markup language. IE the author writes the
document based upon sectioning tags like paragraphs, headings and so forth
and it is upto the WWW browser to render the document appropriately. The
document should be rendended in a presentable manner whatever brower it is
displayed upon.
There are two browsers, namely Netscape Communitcation's Netscape and and
Microsoft's Internet Explorer, that we and a growing number of memeber on of the
internet community beleave to be abusing the spirt of HTML. Netscape support
HTML2 and a limited subset of the proposed HTML3 (Those that Netscape
beleave will make it into the the HTML3 standard) but have also included
'extensions' to HTML2 and HTML3. A lot extenstions are based around
presentation. For example, the ``BGCOLOR'' attribute to the
<BODY> tag to change the background color of the surface that
the text is renedered on (The paper color) and the <BLINK> take
to make text flash. In essence, Netscape are removing the ``Browser
Independencey'' of HTML, makeing Netscape the only brower that will
``Correctly'' view most HTML document.
There are a growing number of pages on the web that have the text ``These
pages are Netscape Enhance and are best views with Netscape..." followed by
a button for the uses to press to download netscape. It seems that Netscape
and other browser implementors are prepared to add new tags when ever they
feel it is nessesary to further enhance the apperence of their pages.
There is a term that is being widely used to describe the above;
``Netscapeism''.
Solution
Through the use of style sheets, the author is capable if changing the
appearence of the document while still allowing the content of the document
to be structured. The style sheet is seperate from the body of text.
The use of style sheets for presentation instead of adding new HTML tags is
much cleaner and more flexable. They can influence many things such as
* Text-flow.
* Font size.
* Font weight.
* Image positioning.
* Context sensitivity for chages in presentation.
* The ability for both author and reader to change the presentation.
* Support of braille.
Methology
Basic Syntax
A style for a particular tag is set by including the tag name (without the
<>'s) followed by a pair of opening and closing braces, {}'s.
Between the braces go the style information in a ``style:value'' pairing.
If a user wishes to change the color of the text of all level 1 headers to
red, the style could be defined as follows:-
H1 { color: red } Grouping
There may be times where there are many similar tags that all need to have
their style changed to the same value as each other. To eliminate the need
to retype the same style information, the tags can be grouped together.
Grouping is performed by separating the list of tags you wish to add style
to with a comma.
To specify that all headings, from level 1 to 6 all have their text in red,
H1, H2, H3, H4, H5, H6 { color: red } Inheritence
One of the key features of cascading style sheets is inheritence.
If a HTML tag doesn't have a particular piece of style information assigned
to it, it will inherit the style of that of the parent.
For example, Suppose we have assigned the font size of the
<BODY> tag to be 12pt and within our HTML document we have not
specified any style information for the paragraph tag <P>. The
paragraph will also have a font size of 12pt because <BODY> is
its parent.
Defining style sheets that incorporate inheritance save a considerable
amount of keying by the person responsible for the documents presentation.
Context sensitivity
There may be times where the style sheet designer wishes to have a
particular style for a section of his document, only if it is a child of
another tag. An application of this could be in lists. Suppose we have the
event of a nested list (a list within a list), and suppose we wished to have
the second level of the list in a smaller font then the first.
* level1.0
* level1.1
o level2.0
o level2.1
* level1.2
The style information that would produce the kind of style change present in
above would be coded
UL UL LI fontsize: 6pt
where LI my be defined as having the fontsize set as ``12pt''
It is out of the scope of this document to go into detail about CCS. Further
information can be found by read though the CCS draft available at the time
of writing this document at
http://www.w3.org/pub/WWW/TR/WD-css1.html
CCS support
Support of CCS has been implemented into the following broswers:
* Arena
Experimental support for CCS has been built into Arena as of version
0.96. Arena is a test bed browser designed to reflect the current
status of the HTML3 draft. As such, its creaters do not intend to make
it a complete full-featured browser.
* Tamaya
Tamaya is a WYSIWYG HTML editor/browser. It supports the whole of HTML2
and the the most invated features of the proposed HTML3 standard. Full
implementation of HTML is in progress. More information can be obtained
from http://www-bi.imag.fr/OPERA/Tamaya.en.html
* W3 mode for Emacs
A w3 mode has been developed for Emacs that has preliminary support for
CCS. Can be downloaded from
ftp://ftp.cs.indiana.edu/pub/elisp/w3/w3.tar.Z
DSSSL Lite
DSSSL (Document Style and Semantics Specification Language) is an
International Standard, ISO/IEC 10179:1995, for specifying document
transformation and formatting in a platform- and vendor-neutral manner.
DSSSL can be used with any document format for which a property set can be
defined according to the Property Set Definition Requirements of ISO/IEC
10744. In particular, it can be used to specify the presentation of
documents marked up according to ISO 8879:1986, Standard Generalized Markup
Language (SGML).
DSSSL consists of two main components: a transformation language and a style
language. The transformation language is used to specify structural
transformations on SGML source files. For example, a telephone directory
structured as a series of entries ordered by last name could, by applying a
transformation spec, be rendered as a series of entries sorted by first name
instead. The transformation language can also be used to specify the merging
of two or more documents, among other operations. While the transformation
language is a powerful tool for gaining the maximum use from document bases,
its commercial implementation is not expected to be fast in coming, and the
focus in early versions of DSSSL will be on the style language component.
Core DSSSL
A. Common Core
Features: table, table-auto-width
Core query language, core expression language
Basic flow object classes
Sequence
Paragraph
Paragraph break
Line field
Character
Rule
External graphic
Box
Table flow object classes
Table
Table part
Table column
Table row
Table cell
Table border
B. Dsssl-o (for online browsers and SGML editors)
Common core plus the following:
Features: online, simple-page
Online display flow object classes
Vertical scroll
Multi-mode
Link
Marginalia
Simple page flow object class
Simple page sequence
C. Dsssl-p (for SGML typesetting systems)
Common core plus the following:
Features: page, multi-column, nested-column-set, combine-char,
general-indirect
Printed page flow object classes
Page sequence
Column set sequence
Printed typography flow object classes
Display group
Anchor
Leader
Score
Side-by-side
Side-by-side item
A complete description of the DSSSL standard may be obtained from
http://occam.sjf.novell.com:8080/dsssl/dsssl96
Others
The number of commercial applications that claim to be SGML compliant is now
extensive. The examples shown in Appendix B are the known SGML tools at the
time of writing. This number is likely to increase rapidly as SGML gains a
wider audience.
Although most products do indeed enable users to create and manipulate SGML
documents, when it comes to actually presenting them there is a problem. To
date, there is no recognised standard for style sheets. As a consequence,
most vendors use their own proprietary style sheet language.
The majority of browsers/viewers for SGML documents will require the user to
either adopt the style sheet format of the vendors product or provide a
migration method for other formats. OmniMark Toolkit is an example of this.
Many browsers will allow only a single DTD to be viewed. This means that
conversion is necessary. Packages such as Explorer and Panorama from
SoftQuad however allow any DTD to be loaded and an appropriate style sheet
can be created and attached accordingly.
As far as HTML 3.0 is concerned, a browser has been developed at WWW Org.
This is available from
http://www.w3.org/hypertext/WWW/Arena/source
The browser is capable of providing support for HTML 3.0 and uses cascading
style sheets (CSS).
The method of independent style sheets and DTDs has been adopted for the
JEDI project as it provides the most flexible approach and will most easily
provide a migration path to DSSSL (ISO 10179.2) when this standard is
universally adopted.
Panorama the product
Panorama is an SGML browser for the World Wide Web available from SoftQuad
Inc of Toronto Canada. Panorama itself is NOT a WWW browser. Rather, it is a
front end to any HTML 2.0 compliant WWW browser such as Netscape 1.1 and
Mosaic. At the time of writing, only MicroSoft Windows based versions of
Panorama are available.
Panorama comes in two flavours :-
Panorama PRO
Panorama PRO has been designed as a publishing tool for the Internet that
allows users and publishers to browse SGML documents on the WWW. The
following features are supported :-
* Multiple style sheets for a single DTD
* Dynamically defined interactive table of contents
* Arbitrary styles by elements
* Maths and Table support
* In-line graphics
* External launch to support add multi-media to documents
* Context-sensitive searching
* Searching within specific SGML elements
* Personal web annotations, bookmarks and links
* Graphic - Text links
The Free Viewer
The Panorama free version allows users to experience the power of a SGML
browsing facility but does not provide the ability to edit the associated
style sheets. The free viewer is available from :
http://www.ebt.com
Arena the product
Arena is a WWW browser that has been designed to be used with HTML 3.0 and
style sheets. Arena is freely available in binary and source versions and
may be obtained from
http://www.w3.org/hypertext/WWW/Arena/
Apart from the cascading style sheet concept, arena supports HTML 3.0 tables
and mathematics.
Conclusions
SGML is ideally suited for EDI as it is text based and is platform and
operating system independent. For SGML to be "presented" it must have a
style sheet mechanism that is also text based. The style sheet approaches we
have studied all conform to this criterion.
The Panorama style sheet can be generated from any text based editor or from
within Panorama. They are flexible and readable whilst being comprehensive.
It is possible for Panorama style sheets to be created for any DTD. The JEDI
project will concentrate on the TEI Lite and Rainbow DTDs.
The Arena style sheet mechanism is similar to Panorama's approach but
concentrates on the HTML 3.0 DTD only.
The DSSSL style sheet mechanism is the prefered one for future development
as it is an ISO standard. Unfortunately its acceptance by software
manufacturers has been slow to materialise.
Transformations of DTDs and Styles sheets via specialised converter programs
is possible as the all mechanisms are open and are platform/operating system
independent. These converters will be created and implemented and will be
discussed in subsequent DMU-JEDI deliverables.
References and further information:
The following is a list of WWW sites that concern themselves with the work
on the DTDs and style sheet mechanisms discussed in this report.
* SGML archive in Norway is ftp://ftp.ifi.uio.no/pub/SGML/
* Robin Cover's page is http://www.sil.org/sgml/sgml.html
* SGML Open http://www.sgmlopen.org
* SGML at Exeter http://www.ex.ac.uk/SGML/
* The Text Encoding Initiative is
http://www.uic.edu/orgs/tei/#Description
* Electronic Book Technologies is http://www.ebt.com
* Arena is at http://www.w3.org/hypertext/WWW/Arena/Status.html