What is SGML and why should I use it?


What is it?

SGML (Standard Generalized Markup Language) is an internationally agreed standard for information representation. SGML can be used for publishing in its broadest definition - from single medium conventional publishing on paper to on-line multi-media database publishing. SGML can be used to produce files which can be read by people, and exchanged between machines and applications in a straightforward manner. This leaflet provides an introduction to the main features of SGML, using non-technical language.

What is so special about SGML?

SGML provides an internationally recognized, non-proprietary language for writing your own markup schemes. Markup is the text that is added to the data of your files in order to convey particular infon-nation about that data. In a word processor, the markup is the proprietary codes that the software inserts into your text files to indicate which words should be printed in a certain font, which paragraphs should be centred, where page breaks occur etc. In a database system, the markup is the proprietary codes which indicate where one field or record ends and another begins, and so on.

SGML relies on the principles of descriptive markup - where the markup is used to indicate the nature, function or content of the data in a file, rather than saying how that data should be processed. SGML can be used to encode semantics rather than syntax. Using SGML, a heading will be identified as a "heading" rather than a piece of text that has to be printed or displayed in "20 point Times Bold".

SGML is rigorous. The markup schemes that you write using SGML declare a set of rules which unambiguously state how the data must be marked up in order to be correctly structured. SGML-aware software can ensure that any markup in a file confon-ns to the appropriate set of rules thereby guaranteeing that the data in that file will be structured in a known way. If the set of rules you are working with declares that text labelled as

a "sub-section" can only occur within text labelted as a "section", SGML-aware software will ensure that this rule is obeyed during text creation and editing.

Why should I use it?

If you are given a file that contains data which has been structured according to a known set of rules and in an unambiguous manner, you are free to use that information however you see fit. You can format it for printing on paper or display on-line. You can map the data to a database to create text archives, or import data from a database to create active documents. You can create new documents by extracting or combining information taken from one or more source files. You can incorporate your files into a hypertext or multi-media system. You can do all this without altering the source file - which means you can process the same source information in many different ways simultaneously!

SGML makes it possible to re-use and share information. People working at different sites, using different editors on different machines can produce SGML files that can be easily combined to produce a single document. Provided that you know the markup scheme which was used to create it, you can take any SGML file and process it however you see fit. Thus, several sites could down-load a copy of a document from an archive and each print if off in their local house style.

Will it change the way I work?

SGML will change the way you think about your work, simplifying many of the things you do already, and making possible many of the things you have always wanted to do.

SGML encourages you to think of any file you create as a container of rigorously structured information - rather than as a word processing text file, or an on-line help screen, or a database file, or a hypertext document. Your concern will be to ensure that a file contains well-written content, structured according to the appropriate markup scheme, rather than worry about how that file will subsequently be processed.

With SGML, data files are processed in a consistent way. To impose a "house-style" on all your printed documents, it is only necessary to ensure that all your SGML files go through the same translation process to map their contents into, say, a file of LaTeX commands, or into a word processor's style sheet.

You only need to write one translation process. If your house-style changes, you simply need to alter the translation process and pass all your old files through the amended version to give them the new look. You do not need to edit the old files themselves, because they never contained any formatting instructions in the first place!

And remember, you can use exactly the same source files for your printed output as the source for your on-line (hypertext) documents or mapping to or from your text database.

Who else is using it?

Scientific and reference publishers, such as Elsevier, Springer-Verlag, Kluwer, and Oxford University Press. Organizations with largescale information handling needs, like the International Standards Organization, HMSO, the European Patent Office, the European Commission, and the US Department of Defense. All the major suppliers of UNIX software and hardware have chosen SGML to deliver their next generation of documentation for publication on paper and as on-line "man" pages. The latest version of the Oxford English Dictionary is an SGML document available both on paper and as a CDROM.

Within academia, the Text Encoding Initiative (a major international project), recommends SGML for the coding and interchange of any electronic text intended for scholarly analysis. The American Chemical Society and American Mathematical Society will be using SGML for all their electronic publishing needs. CERN and the publishing wing of the Institute of Physics have also adopted SGML.

What hardware and software will I need?

SGML can be created using any editor which can produce files which do not contain application-specific codes (i.e. plain ASCII or EBCDIC). Dedicated SGML software is availlable for virtually every platform or environment - with high-quality commercial software available

for all the major machine types and operating systems (PC, Mac, UNIX etc.)

Where can I find out more?

The SGML Project is funded by the Information Systems Committee of the UFC to provide a comprehensive information and advice service on SGML, its uses, and related software. We also provide a programme of lectures, seminars, and workshops that is entirely free to any recognized academic or research institution. We operate an electronic archive of public domain software, and manage an on-line Mailbase discussion list.

The SGML Project
c/o Univ. of Exeter IT Services
Laver Building
North Park Road
Exeter EX4 4QE
Tel: 0392-263946
Fax: 0392-211630
Email: sgml@exeter.ac.uk