[Mirrored from: http://www.inf.tu-dresden.de/~jw6/doc/sdc/intro-en.html]

Typeset, a short introduction

This is a short introduction into sdc.

sdc is a formater for SGML documents.

This document describes the use of the document types which are currently handled from sdc. It does not describe the adaption of sdc to other document types or target formats.

1 Overview
2 What is SGML
3 Invocation
- 3.1 Environment
- 3.2 Files
4 Document types
5 Elements of the document
6 Creating Indexes
7 Bibliography database
8 Conditional Inclusion
9 Slides
10 Personal Data
11 Appendices
12 Large documents
13 Literate Programing
14 Parameter Entities
15 Changing the Layout

A General Entities
B Notations
C Local Features
D Installation
E Changes
F Problems
H Bibliography
I Index

1 Overview

Typeset is an extensible formatter for documents. It transforms documents using SGML markup into various target formats.

Typeset comes with a couple of document type definitions (DTD's).

The DTD's feature the reuse of text, minimization of markup and readability of the SGML source. They share their elements as much as possible.

The formatting differs due to the features possible in the target format and to the rules common for the type of the document. This includes the automated rearrangment of text and insertion of standard parts like contents sections, sorted index and bibliography. The latter for instance is composed from the items of a database which are referenced in the document. For some formats the output may be spread over a couple of files. See the target type documentation for details.

According to the goal of text reuse and the aim to support many target formats, these DTD's don't attempt to cover each and every case possible. Instead, they try to provide all elements nessesary for daily use and leave the implementation of special features to extensions.

It is also possible to have parts of the documents using other notations. E.g., pictures drawn with tgif, xfig, the @Fig package of Lout or encapsulated postscript.

It is fairly easy to coerce sdc to parse documents with other DTD's. But this implies to write rules for formatting in the desired target format(s), or fit in another parsing stage which changes it into a form as if it was marked acording to a supported DTD.

The transformation (formating) is described by files of scheme code related to both, the document type and the target format. Only combinations of common value are supported by default. (For instance for letters only PostSript output is defined.)

Currently there are these DTD's

document: Simple ``plain'' documents.
report: Technical reports, documentation etc.
book: Books (longer documentations).
bibdata: Bibliography database.
manpage: Pages for the Unix(TM) man command.
brief: A letter according to DIN.

Currently the following target formats are supported:

PostScript (for english and german text)
LaTeX
HTML (Hyper Text Markup Language)
Info (to be used into the on-line help of emacs)
man suitable for roff -man
ASCII
source code (literate programing)
slide to extract slides from a document
limitted support for RTF

Future output formats will include: roff -ms (or -mm), RTF.

2 What is SGML

This section may be translated some time. It's present in the german version and it's intended to explain the advantages of text processing over word processing and the advantages of a generalized markup over target dependand one.

If you retrieved the package you are probably convinced anyway. So it's obsolete.

You may refer for a description of SGML to [1].

3 Invocation

First of all set your environment variable DOCPATH. It should point to the top directory of the tree, where all your files containing SGML data are stored.

Setting this variable could be done like this (for (t)csh) and compatible shells.

 setenv DOCPATH $HOME

Invocation systax for sdc:

sdc [options|filename.sgml]*

Options:

-o filename

Set the name for the output file. If omitted or set to - the output goes to the standard output. (This can cause problems with some target formats if they split the document.)

-O type

Set the target format. type can be:

ps: create a PostScript document.
latex: create a LaTeX file.
html: create a HTML page.
info: create an Info file.
man: create a man page.
literate: create source files (literate programing)
rtf: (only partially supported) create a RTF file.
slide: create a PostScript file holding the slides from the document.

If the -O switch is omitted a guess is made from the extension of the output file name. If neither gives a target type this is an error.

-D directory

Add directory in front of the path searched for entities (files) of the docuemnts. Each option can add only one directory. Multiple options are processed left to right, i. e., the last directory at the command line is searched first.

-i entityname

Ensure, that a definition like

<!ENTITY % entityname "INCLUDE" >

precedes the processing of the documents. This is useful for optional including of marked sections. Refer to the manual sgmls(1) for a detailed description. This option is passed to sgmls.

-m file

Extend the list of catalog files to search for some SGML entities. Refer to the manual sgmls(1) for a detailed description. This option is passed to sgmls.

-L dirname

Set the name of the directory to use as library of files to search for target format descriptions.

-R file

Set a startup file to load after the default ~/.typesetrc. Multiple -R options are allowed and processed in the given order. The files argument is treated to be either a path name to the file or one relative to the rc directory of a directory in the library (see -L).

Startup files can have their own arguments. If the argument given with a -R option contains a colon, only the half up to that colon gives the file name to be loaded. The rest of the argument (without the colon) is assigned to the variable *-R-option-argument* while the specified file is loaded. If there was no colon in the argument, #f is assigned.

-V level

Be verbose and don't delete temporary files (for debuging). Level must be a number. The default for level is 1. This will give only warnings and (for a historical reason) a message upon success. Higher values give more messages.

With the -R option there are additional (long) options available to change the over all behavior. These are used by supplying one or more of the following file names to the -R option.

nidx: Pretend having the NIDX token in the face attribute.
1c, 2c, 1s, 2s: Simmilar to nidx modify the value in effect of the face attribute in the top level document.
no-margin: No page margin in the (ascii) output. Only implemented for lout processing at the moment.
HTML2: Don't use HTML-3 features in formatting.

Attention! Be careful to supply the exact name to the -R option. The same policy as for dotfiles applies to those files: if they don't exist they are silently not loaded! THere is no warning message.

A typical call would be:

sdc -o text.ps source.sgml

3.1 Environment

Typeset recognizes the following environment variables:

DOCPATH: This path is used to find the entities of the document. It gets extended (at the end) by sdc to include the files of it's own. Also directories give by a -D option are prepended.
Usually a good value for DOCPATH is something like $HOME or $HOME/text:$HOME/doc.
SGML_CATALOG_FILES: The files mentioned by this variable are consulted by the underlying parser to find some SGML entities. For a detailed description refer to the the manual sgmls(1) . This variable gets extended by sdc to include one file of its own, the first file named CATALOG found in the library. As for sgmls the value can be extended by the -m option, which is simply passed to sgmls.
Usually it's good to leave this variable alone.
TYPESETLIB: This varaible is used by sdc to find the directories to search for formatting translation files and the DTD's and CATALOG files for the underlying SGML parser. It may point to one directory or a list of directories seperated by colons. This value can be overwritten by the -L option.
Usually it's good to leave this variable alone, except if you want to overwrite some but not all files of the library.

3.2 Files

personal.data: is used by the DTD's which come with sdc to find definitions for the SGML entities related to the author. These two are myself and my-Inst. It may define some more. But these are used to insert default values for the name and the institution of the author. Therefore it's a good idea to set the environment variable DOCPATH so sdc will find this file. An example how to set up the content of this file comes with sdc.
~/.typesetrc: if any, is loaded after startup and comand line evaluation. It might contain any scheme code.

sdc uses the files and directory structure in its library to parse the document and determine the formatting. For a descrition of this refer to the developers documentation.

4 Document types

4.1 Document type `document`

We start with an example:

<!doctype document public "-//JFW//DTD Document//EN" >
<document>The Title
<sect>Intro

Here goes the introductory text.
<sect>We continue

This is the text body of the first section. It's going to be a little
bit longer to show, that the formating of the source file really
doesn't matter for the output.

We start a new paragraph simply by inserting a newline.

A document starts as every SGML document with the document type declaration. It's opened with the document tag.

There are the following attributes available to a document:

date: The date of the document.
author: The author.
inst: The institution.
lang: The lang attribute for the document is obsolete. It has the same effect as changing the public document language in the document declaration.
face: This attribut is intended to affect the representation of the output. E. g., a value of 1c should cause printing in one column and a value of 2c should result in two column. A token nidx will supress the generation of an index even if index tags are used. For face multiple values can be assigned (in quotes). See documentation for the target formats for the treatment of this argument.

Then some paragraphs might follow. Next after these paragraphs either none or more than one sections can follow. Eventually and appendix can finish the document.

4.2 Dokument type `report`

A technical report consists of an abstract followed by sections and possible appendixes. Appendices themself are divided into sections.

Again an example:

<!doctype report public "-//JFW//DTD Report//EN">

<report 
date="Today"
>Reporttitel

<abstract>

This is the abstract.
</abstract>

Here comes some introductory text. For some targets (e.g., PostScript
because Lout doen't allow text at this point) this text is taken to
be and section named "Introduction" (in the document language).

<sect>first section

The text.
<sect>second section

More text.
<appendix>
<sect>Appendix

Text of the appendix.

The part from and including the appendix-tag is optional.

The sections may be divided by sect1-Tags. These by sect2. There is no division of the sect2.

4.3 Dokument type `book`

Books are written using a document type declaration like:

<!doctype book public "-//JFW//DTD Book//EN">

<book 
date="Today"
>Title of the Book

The <book> tag has the same attributes as the document or report tag. This is for consitency but questionable. The inst attribute would better be called publisher. Future versions will eventually rename this attribute.

In principle a book is divided into chapters (at least two) with the <chapt> tag as a simple document is divided into sections. Chapters themself consist of sections. Prior to the first chapter there may be two special sections named <preface> and <intro>.

Future versions will support grouping of chapters into <part>'s.

4.4 Document type `manpage`

The document type manpage is intended to produce pages for the Unix man command. It restricts the set of available elements to what roff -man can handle. Especially figures are not valid elements. Furthermore this document type enforces a ``good style'' for the man page. The possible sections are predefined. (Well there is a backdoor to use self defined sections, see doc/badman.sgml.)

A manpage begins like this:

<!doctype manpage public "-//JFW//DTD Manpage//EN">

<manpage
date="today"
>
TYPESET
<short>
an extensible SGML formatter
<synopsis> <code/typeset/ <var/options/ <var/files/ ...
<descript>

Up to this point the elements are enforced. That is, you can't write a man page without a title (between the <manpage> and the <short> tag, a short description and a synopsis. Up to the synopsis these elements can't even spawn multible lines, which couldn't be formatted to be a man page.

The description section can be preceded by a config section. But it while config can be omitted <descript> can not.

The description section plays a special role in another way: it's the only one which may be devided into subsections using <sect1>.

The following sections are also valid for a man page. They must appear in this order:

options, return, errors, example, env, files,conform, notes, diag, restrict, history, see.

Here what goes into the sections:

4.4.1 Synopsis

Tag: <synopsis>.

The command or what to write to call the funtion. For example:

<code/typeset/ <var/options/ <var/files/ ...

or for C funtions and system calls:

#include <something.h>
int foo(int bar);

int foo2(int bar);

4.4.2 Config

Tag: <config>.

This section explains how a device is configured: major/minor numbers, their meanings and the meaning of the device name.

4.4.3 Description

Tag: <descript>.

long drawn out discussion of the program. It's a good idea to break this up into subsections. Using <sect1.

There is no BUGS section, instead discuss them here.

4.4.4 Options

Tag: <options>.

Some people make this separate from the description.

It's intended to hold one <desc> element (5.3). Don't forget to use the elements <code> and <var> here. Example:

<desc>
<dt><code/-option/ <var/file/

    Text describing the option.
...

4.4.5 Return Value

Tag: <return>.

What the program or function returns if successful.

4.4.6 Errors

Tag: <errors>.

Return codes, either exit status or errno settings.

4.4.7 Examples

give some example uses of the program

4.4.8 Environment

Tag: <env>.

Environment Variables this program might take care about

4.4.9 Files

Tag: <files>.

All files used by the program. Typical usage is a <desc again. (5.3)

4.4.10 Conforming To

Tag: <conform>.

SVID [EXT], AT&T, POSIX, X/OPEN, BSD 4.3

4.4.11 Notes

Tag: <notes>.

Miscellaneous commentary

4.4.12 Diagnostics

Tag: <diag>.

all the possible error messages the program can print out, and what they mean.

4.4.13 Restrictions

Tag: <restrict>.

bugs you don't plan to fix :-)

4.4.14 History

Tag: <history>.

Programs derived from other sources sometimes have this.

4.4.15 See Also

Tag: <see>.

Other man pages to check out, like:

<ref t=m id="man(1)"//, <ref t=m id="man(7)"//, <ref t=m
id="makewhatis(8)"//.

Refer to (5.5) for a detailed description of <ref>.

4.5 Document type `brief`

The Brief is a DTD according to the German DIN standard. Example:

<!doctype brief public "-//JFW//DTD Brief//DE">
<brief fenster=ja>
<von>&my-adr;
<an>
<adr
NAME="Mustermann"
VORNAME="Erwin"
ORT=Musterhausen
STRASSE="Musterstra&ss;e 7m"
PLZ="01000">
<datum>1. Januar 1995
<betr>Musterschreiben
<anrede>Lieber Erwin,
<text>
Heute reden wir &ue;ber Musterbriefe.

Wie gef&ae;llt Dir das?
<gruss>Ciao
<anlage><pkt>1 Musterschreiben

This gives a letter according to the DIN. The sender address appears a second time in the window of the envelope. Also fold marks are printed fenster=nein will suppress this.

The tags anrede, gruss and anlage may be omitted. In this case standard text is inserted for the opening and closing. Enclosed is omitted.

The form of the address looks a little complicated. This is because it's intended to come from a database. If you set up on, its use will look like the from address (<von>...).

The entity my-adr is defined in the file holding the personal date described in section (10).

5 Elements of the document

The different document types share the same elements (Except for ``brief'').

5.1 Paragraphs

Paragraphs are seperated with a -Tag.

To reduce markup, an empty line counts as an -tag. Sequences of those tags are reduced to just one.

5.2 Enumerations

There are two kinds of lists, ordered and unordered.

<enum>: opens an enumerated list.
<list>: opens a not numbered list.
<item>: starts a new item.
<o>: same as <item> (short form).

Both <list> and <enum> have to be closed (by </list> or </enum>).

Example:

<list>
<item> Language to describe the logical structure of text.
<o> a tool and library to format SGML text into
<enum>
<o>PostScript
<item>HTML
</enum>
<item>One more point
</list>

formats to:

Language to describe the logical structure of text.
a tool and library to format SGML text into
1. PostScript
2. HTML
One more point

5.3 Glossare

Glossares are declared by the <desc> Tag.

Again an example:

<desc

<dt/<desc>/ opens a description

<dt/<dt>/ encloses the described topic

</desc>

And the corresponding output.

<desc>: opens a description
<dt>: encloses the described topic

You don't want to put newlines (starting paragraphs) between the <desc> and the <dt>. If you want (as me) omit the closing > from the <desc> (say <desc).

5.4 Pictures

We don't have a chance to describe pictures in terms of SGML, but we can tell where the formating application should put them.

To include pictures one can use:

Entity definitions and references. The reference is either a less controlled entity reference like &name; or it's done by using the foreign-tag. The latter should be prefered, because it could serve more control, but in fact there not much of a difference.
This is the prefered method.
Inlined code, see (5.13).
For some cases, like including GIF-pictures into HTML documents, it's nessesarry to compromise the portability and target independance of the document.
Those cases need processing instructions. For the mentioned case one needs the following:
1. A SDATA entity to get a literal > into the final output. This could be achieved by having this definition in the ``header'':
```
<!ENTITY lit-gt SDATA ">" > 
```
2. A Processing instruction at the place where the picture (here picture.gif) goes:
```
 <? <IMG SRC=picture.gif> &lit-gt;
 
```
 (We need the lit-gt because SGML has no way to have (escape) a >-character within a processing instruction)

In all the cases there are two ways how to include the picture, either directly between the lines of the text or as a floating part of the document. The latter form will allow you to put a caption line on it and an identifier to refer to from any part of the document.

Having pictures between the lines usually doesn't look very professional. For the effort word processors require to handle floats, it became common to do so. You should think twice whether this form is appropriate in your case.

If you use entities and notations for your pictures, you need to declare both in the header of your document. See (B) for a more detailed description and refer to [1] for full details.

The local installation (C) may provide some predefined notations. A full installation supports at least eps, fig, lfig, roff, latex, tgif.

To include a picture entity (with entity name ``name'') defined using notations between the lines of text just write the entity reference like this: &name;.

To include the same picture as a floating object use the <figure> tag:

<figure id=refname>
<foreign file=name>
<caption/The caption line/

The id attribute tells the name to be used by cross references (5.5) for this figure.

5.5 Cross References

Cross references are introduced by the ref-tag. It has two attributes:

id

The ID it refers to. The interpretation of the id depends on the value of the t attribute.

t

The type of reference. There are 3 possible values:

X: (the default) The ID refers to some id in the same document.
Note that this mean a document as SGML understands it. If you are inside something which is a full document by itself, but is included as a SUBDOC Entity into some other document, you can refer to any id within this document, but not any in the outer document!
B: Bibliography reference. The Id notes the Tag Attribute value in the database (7).
M: The ID refers to a manpage.
U: A URL. The ID hold the complete URL.

5.6 Emphasize

To emphasize long parts of text there is the <quote>-tag.

Example:

<quote/Important result: long citations, definitions and other
material which should be emphasized is enclosed by quote-tags./

And the result:

Important result: long citations, definitions and other material which should be emphasized is enclosed by quote-tags.

The <quote>-tag has a style attribute accepting the values default (wich is the same as if nothing is given) and center. These give the recomented style. Default is to narrow the text a little, while center narrows and centers the quoted material.

5.7 Footnotes

Footnotes are enclosed by <footnote>-tags.

<footnote/You can't have figures in footnotes./

Will format as(1).

5.8 Notes

Sometimes you may like to have longer side notes. These are enclosed by <note>. This looks like this:

This text has been enclosed by `<note>` and `</note>`.

5.9 Verbatim copied text

For excerpts of source code there is the <verb>-tag. The examples in this text are made mostly by this. Code inside of the verb marked region is not at all processed for SGML references. Therefore no references to any entity are possible inside (like <).

There is also a variant of <verb> called <rverb>. This means ``replacable'' verbatim. The contents of <rverb region is parsed for entity references. Thus allowing references to external entities or SGML-end tags inside. The examples which have end tags inside are written using this element.

5.10 Emphasizing words

To emphasize short phrases or words you can choose from the prefered tag  and these:

: This will produce a different kind of emphasizing due to the level of use. (slanted, bold..)
: Heavy emphasize.
<bf>: Bold face
<it>: italic
<tt>: tele type kind

5.11 Linguistic Markup

For documentation purpose it is common to distinguish between literals (code), variables and meta characters. Therefore markup exists. These tag don't nest, that means each of them ends each other of this group.

For literals use <code>, variables <var> and for meta characters <meta>.

Please don't look for use of them in this manual, they are late introduced.

5.12 explicit line and page breaks

In rare cases you might need to insert unconditional newlines. There is the general entity nl for.

To enforce a page break at a certain position use the <newpage> element.

Don't use it too much. It might look strange in some output formats. You'll need it most with the slide target, so best enclose them all the time with a marked section like:

<[ %Slide [ <newpage> ]]>

5.13 Inline Code

Using the <inline> element you can include code using other notations (see (B)) as available local at your site (see (C)).

The <inline> element takes one argument n which must be assigned to a valid notation. This way you can achieve special effects or write tables and equations.

Inside the <inline> element you can't write other SGML markup, but you can refer to predefined entities (using the &name; notation). This raises the question how to include a & followed by a letter. For this purpose you need to write a decimal character reference i.e., &.

Example:

<inline n=lout>
45d @Rotate @ShadowBox 2f @Font {that's a funny cyan @Color "&" }
</inline>

will give you

If you use the `<inline>` feature, be careful about the filenames you use: sdc will eventually (due to the need of the target format) create files matching the pattern:
basename-of-output`-`number`.`extension.

For admins: The example above might not work ``out of the box'' at your site since it uses a notation which needs to be set up. See (C)and the documentation about target formats for things nessesary.
If you're going to install sdc. See the file `doc/notations.sgml` of the installation and adapt the it to your needs.

5.14 Tables

In response to various requests for tables one or more the following ``syntax'', or to put it better the following ideas will be implemented. I strongly encourage everybody to mail me comments about this.

Currently there is one SGML construct to describe tables within sdc. If someone comes up with a better solution for (it shouldn't look too strange in the source -- sdc is supposed to remain a ``don't worry'' application) it will be incorporated.

A table is enclosed with the <table> tag. It can contain one of the implemented kinds of table (currently only <tbl>).

5.14.1 `tbl`

A tbl-table consists of patterns and rows. Each row must name the pattern it is formed after or otherwise it uses the most recent used or defined pattern. A row itself is a sequence of cells. Cells begin either with a <c>-tag or simply a |.

A pattern consists of tags describing the alignment. There are left, right, center, decimal and block. Between these <sep> are allowed to request vertical lines in the table.

The formatting of the cells within a row is described by the associated tag in the used pattern.

Probably an example is the better way to explain.

The implementation of tables is not finished. At the moment there are no tables at all for LaTeX.
The here defined syntax has a) not all the features of all the backends, so you have to drop to plain backend (e.a., inline) code ifyou really need those features b) it has more features than all backends have in common, therefore depending on the backend some features are silently dropped off.
For HTML HTML-3 Tables are implemented at the moment. As there is virtually no client to display these, one can change to use fixed font preformatted tables instead. (If you have a client please check if the code works: I can't.) Change `html-tbl-writer-function` in the file `include/layout.scm` to something else but `'html3-write-tbl` or use `-R HTML2` at the command line.

<figure id=tblexam>
<table<tbl>
<pattern<left<center<sep double<decimal align=",">
<r><bf/Name/ | <bf/Group/ | <bf/V/
<sep>
<r> Fred | M | 3,4
<r> Sian | F | 5,78
<r> Tiger| F | 100
<pattern/<right<center<right>/
<sep double>
<r>A | &lt;=> | B

</tbl<table
<caption/Example table/
</figure>

Will give you what figure (1) shows.

		 ||
Name	Group	|| V
Fred	M	|| 3,4
Sian	F	|| 5,78
Tiger	F	|| 100
   A	<=>	B

Example table

Questions:

Better have the freedom to mix the patterns with the rows (as it is now) or to group all patterns before all the rows.
Is the short reference character ``|'' a good choise for the column separator?
How much requests will come for row spawning cells? (They eat more steam in implementation, thus it will take some more time.)

5.14.2 Next

How about a HTML-3 like syntax. This is even harder to implement on the existing targets (while it's obviously easier on viewers -- but these are internals). Open questions:

I, personaly, dislike the very verbose syntax of HTML-3 tables. How is yours? Better have <r> and <c> tags (and ``|'' as short reference for the latter) to separate row and columns or the long syntax of HTML-3. The latter will mess up your source code, but has the advantage to be the same as within HTML. (At the other hand why should we be compatible with the stuff we compile out of the source?)

If the above mentioned ways to create tables don't suit your needs you can use notations see (B). And you might consider utilizing the <inline> element (see (5.13)) to write the table using other notations as available local at your site (see (C)).

5.15 Equations

There is a limited support for equations at the moment. As they are used they'll be added. To give an example what's supported at the moment:

<quote>

<bf/Lemma:/ <math/&alpha; &isin; <set/M<sup/200/<sub/4// &cap;<set/N//
</quote>

Will yield:

Lemma: [alpha] [isin] M^200_4 [cap] N

Future version will probably support the same features as HTML-3. (As this is in fact simply a SGMLish translation of LaTeX's idea of equations and the latter is commonly treated to be the best way to write equations.) The question remains: is this the best way or does anybody have a better idea?

As mentioned the support on equations is limited at the moment and intented for simple formulas. The translation into formating instructions is straight forward implemented and not always reliable. If you intent to use LaTeX as backend you want to use <inline latex> (see. (5.13))and native LaTeX coding instead.

It's not a big task to extend the formating rules to filter things like that through the PostScript backend and convert it into say GIF for HTML. But nobody came around to program it yet.

5.16 Foreign Languages

sdc will insert standard text phrases if appropriated. These depend on the language of the document.

In the simple case you can choose the language for the whole document. This is done by the document declaration. That is:

<!doctype document public "-//JFW//DTD Document//DE" >

will produce a German document and:

<!doctype document public "-//JFW//DTD Document//EN" >

an English one.

Furthermore the tags report, document, chapt, sect, sect1, sect2, lang have an attribute lang which will temporary change the language in the enclosed part.

Footnote

1): You can't have figures in footnotes.

6 Creating Indexes

sdc creates an index for a document if at least one <index> tag was used in the text.

An index tag can appear at every place where text is allowed except for headlines, which are terminated by.

For the index tag the following attributes are defined:

id: The topic which will appear in the index section. This attribute is required.
sub: An optional subtopic.

Creating useful indexes is an art by itself. Therefore you should choose the attribute values carfully. The most common (recomended) way to use indexes is like that:

<sect id=Indexes>Creating Indexes
<index id=Element sub=index
<index id=Indexes>

Please note: Because of an unresolved problem it's better to use the short notation as above. That is: don't close the <index tags (omit the >). Otherwise the PostScript output will create an empty paragraph.

As mentioned, the <index> tag is not restricted to be used immediately after the section start.

7 Bibliography database

sdc can use one or more databases for bibliography. If items of them are referenced (by a <ref> see (5.5)), a section is automatically appended to the document and the referenced items are listed.

To use a data base you need to include the database file for instance like this:

<!doctype report public "-//JFW//DTD Report//DE" [
<!entity bib system "intro-bib.sgml" subdoc>
]>

And you have to reference the entity later on anywhere in the document like this: &bib;. But be sure to do so at some point where data is allowed, e.g., where the first paragraph begins. (Otherwise you'll violate SGML rules.)

The database has the following structure (with repeated BIBL's):

<!doctype bibdata public "-//JFW//DTD Bibliography//EN" [
]>
<BIBL
     Tag="SGMLGuide"
     Author="Martin Bryan"
     Title="SGML an authors guide"
     Publ="Addison Wesley Publishing Company"
     Year="1993"
     exc= "not avail"
     ISBN="...."
     COM="a comment on the book"
>

Don't write data in this file, only start tags as the one above.

This data base will be extended (some day) to support at least all features a bibtex data base supports.

Please note: If you get warnings about items not in the data base, then the numbers within the references may be wrong. Those numbers are only correct if all references could be resolved.

8 Conditional Inclusion

You can always include text depending on the definition of some parameter entities as you can from sgmls. That is you write a marked section like this:

<![ %Name [
conditional included Text
]]>

Where name is a prior defined parameter entity. Refer to section (14) for predefined entities and conventions with sdc. To exclude the text by default you need to have a entity definition like the following in the document type definition:

<!ENTITY % Name "IGNORE">

Now you're able to include the marked section by a command line switch -i Name. This will pretend, that a entity definition

<!ENTITY % Name "INCLUDE">

is seen from the parser, which will overwrite the other one.

Depending on the target format (the -O switch) and the public text language some parameter entities are predefined. See (14) for details.

9 Slides

Slides are extracted from ordinary documents, better say only from documents of the types document, report and book.

Everything enclosed by a <slide> tag goes onto one slide. (Or more than one slide, we can't control what happens if a slide gets overfilled.) Everything outside of those tags is simply dropped.

Slide tags might appear wherever a  tag is allowed.

If you have more stuff in the document as you want to have on the slide (as it is usually the case) you can exclude parts by marked sections, see (8).

The title used for the slide is that given to the last division (like chapt, sect sect1) seen. The slides are grouped by the sect's of the document. So if all you want is a document full of slides, just write one and put <slide>'s in instead of paragraphs.

As there is only a ``Slide'' entity defined if you generate your slides but the usual case is to exclude stuff from them put three lines like:
<!ENTITY % Slide "IGNORE"> <[ %Slide [ <!ENTITY % noSlide "IGNORE"> ]]>; <!ENTITY % noSlide ``INCLUDE'' >
This way noSlide is defined if Slide is not. Why? SGML say's the first definition wins. If you define Slide via the target noSlide is defined to ignore otherwise to include.

10 Personal Data

At some points sdc will insert person dependant text like the default for the authors name and institution.

These are defined in a document called personal.data. Usually it's to be found in the directory DOCPATH points to (see (3)).

There is a file called example-personal.datain the standard library directory to be copied and adapted.

The following general entities must be defined from this file. (They are used to be the default for the corresponding attributes)

myself: The name of the author.
my-Inst: The institution / organisation.

11 Appendices

Appendices are all the sections which follow the <appendix>-tag.

12 Large documents

Large documents can be spread over individual files. These files together form the document. In principle there are two way to declare them.

As a General Entity (see (A)).
As a SUBDOC General Entity

The first form will insert the whole document instead of the entity reference. This means, that the markup is part of the document, the entity is included in. This restricts what you can put into the document. but is has the advantage, that cross references work from outside into and from inside out.

Example: within the <doctype ... one can have a entity declaration like:

<doctype ... [
...
<!ENTITY t.desc system "descript.text" >
...
]>

and later on, at the position of the document where the contents of the file is to appear a usual entity reference like <t.desc;. (Note that you can not cross reference to that piece of the document as a whole, but to the id's defined within.)

sdc won't do anything interesting with the first form. It's already handled by the SGML parser.

The second form, subdoc entities, makes the part to be included a document of it's own. That is, it can (could) be used without being included. Hence you can't put references from inside out. SGML also disallows references from outside into it.

Those subdoc entities are restructured to form a division at the place the references occur, e.g., if you are within a section and reference a subdoc entity you get a subsection holding the text. sdc achieves this by preprocessing the subdoc entity in a smart way. It re-tags the dcoument as if it was taged according to the document-DTD.

As explained you can't do any cross referencing between the included document and the outside document. But you can reference to the subdocument as a whole by using it's entity name.

Example: the entity declaration changes from the above form to something like:

<doctype ... [
...
<!ENTITY bib SYSTEM "intro-bib.sgml" SUBDOC>
...
]>

Note that the declaration is extended by the SUBDOC keyword. And remember that documents included this way must start with a <doctype... declaration while those suitable for the first form must not.

Those subdoc entities are included the same way, that is in the example by an entity reference to lt;bib;. In this case the entity name (bib) becomes available for cross referencing (you can do lt;ref id=bib// in this example).

13 Literate Programing

sdc supports so called ``literate programing''. That is the source code and it's documentation are mixed. To get the plain code it is first to be striped from the documentation.

With sdc it is possible to have the sources which goes into one file spread over a set of files, i.e., one document and also to have one document contain the source of a set of files.

To do literate programing the code is surounded by the <literate tag. The literate tag has one attribute called file, the name of the file where the code goes. This file is opened at the first occurence and closed at the end of the work of sdc. If no file is given the last used is implied if none at all has been specified prior standard output is used.

Within the literate taged area there may be anything what can appear in normal text. But no formating is applied to the output if target literate is choosen. The typical (intented) contents of it are verb and rverb elements and may be plain text paragraphs. Because of the formating applied on other target formats plain text paragraphs are only a good idea if the code (or comment) is readable after reformating.

With the features of replacement and conditional inclusion this form of literate programing might be useful for simple preprocessing.

14 Parameter Entities

SGML knows two kinds of entities: general entities and parameter entities. While general entities can be referenced anywhere, parameter entities are restricted to the meta level where the document structure is defined. This is at most only the document type description. But also marked sections can use parameter entities for conditional inclusion (see (8)).

To avoid conflicts between user defined parameter entities and predefined parameter entities there is one convention: All parameter entities defined by sdc begin with a upper case letter. User defined parameter entities should therefore begin with a lower case letter. This is by no means enforced but a recommented convention.

Depending on the target format the following parameter entities are predefined. Only the one associated with the target format is defined to INCLUDE all other yield IGNORE.

<IGNORE the. index id=``Entity'' sub=``predefined''>

LaTeX: for LaTeX processed documents.
Lout: for PostScript output through Lout
HTML: for HTML pages
ASCII: for ASCII output
Info: for output in Info format
Literat: for literate programing
RTF: for RTF output
Slide: together with Lout for extracting slides.

Depending on the public text language parameter entities named as the ISO 639 defined language codes will be define to INCLUDE. The other again to IGNORE.

Currently only the definition to include is handled. It's up to you to define them to IGNORE in you doctype diretive. Also only English and German are supported at the moment.

Here the codes reserved for this purpose:

EN: English, DE: German, FR: French, GR: Greek, IT: Italian, NL: Dutch, ES: Spanish, PT: Portuguese, AR: Arabic, HE: Hebrew, RU: Russian, CH: Chinese, JA: Japanese, HI: Hindi, UR: Urdu, SA: Sanscrit.

15 Changing the Layout

As it is not intended to have particular formating instructions or something like that within the document, it can't be done.

To change the formating use the -R option to load a scheme file. If you do so, be sure what you do! You might want to have a look at the file include/layout.scm as the definitions there are most likely to be changed due to taste. Here an example of such an file showing how to change the initial font for lout.

(doc-preprocess-hook
 'add
 (lambda ()
   (if (equal? doc-output "lout")
	   (eval (set! lout-initial-font "Times Base 12p")))))

Having this within a file, say TB12.scm call sdc like

sdc -R TB12.scm -Ops -o output.ps text.sgml

A General Entities

Currently there are general entities for setting mathematical symbols predefined. Their names are those defined by Addison Wesley.

Also the set of general entities for Greek characters are defined as by Addison Wesley.

To see these set of entities and their repesentation, format a document like the following into the target format you need.

<!doctype document public "-//JFW//DTD Document//EN" [
<!ENTITY f.AWm SYSTEM "AWmaths.text" >
<!ENTITY f.AWg SYSTEM "AWgreek.text" >
]>
<document face="2c 2s nidx">General Entities

<sect>AWmaths

&f.AWm

<sect>AWgreek

&f.AWg

Prior version had a problem with the `<` Entity. There where two flavors, `<` and `<`. The problem is solved, see the changes section.

B Notations

SGML provides so called notations to define entities which use ``foreign'' descriptions of their content. That is, the entity might be a file containg figure drawn with say tgif. The interpretion of system notation is implementation dependant.

You might define notations. And some entities using them. sdc provides an interpretion for notations which are declared to be SYSTEM.

The system identifier is used to start external commands. Prior the command execution the following macros are expanded:

%s: the system identifier of the entity
%f: the complete filename of the entity
%%: a single %

If the notation is to be applied to inlined code, a temporary file is used.

Example:

<!DOCTYPE document public "-//JFW//DTD Document//DE"[
<!NOTATION cat SYSTEM "cat %f" >
<!ENTITY f1 SYSTEM "file1" NDATA cat>
<!ENTITY f2 SYSTEM "something" NDATA cat>
]>

<document
face="2s 2c"
>Notation Demo

Some text goes here. 
&f1;

And now call &f2;.

For an extensive example see `doc/nottest.sgml` in the sdc distibution. This shows the ``real world'' notations provided by default by sdc.

The above example declares a notations named ``cat'' which starts the external program cat with one argument, the full filename of the entity the notation is apllied to.

Then two entities (f1 and f2) are declared. These use both the notation cat. Their system identifiers are ``file1'' and ``something'' (i.e., their content is stored in these files). And their content is declared to be in the notation cat. (That is done by the NDATA like notation data keyword followed by the name of the notation i.e., cat for the example).

When the entities are referenced (&f1; and &f2;) the command line associated with the notation (i.e., cat %f) is executed (after the expansion of macros as explained above). The output of this command is feed into the backend/output of sdc.

Make sure, that the output generated by the external programm is suitable for the target format. If needed, use conditional definitions for the notations depending on the target format. (See section (8) for target dependant defined parameter entities.) Needless to say, the gain in comfort and features is paid by a loss in document portability.

A possible application could be to have a picture stored in some format (like that of xfig). If there is a program fig2dev which translates this format into encapsulated PostScript you may wrap it by a shell script like this:

#!/bin/sh
SRC=`basename $1`
TARGET=`basename $1 .fig`.eps
fig2dev -L ps $SRC > $TARGET
echo "@IncludeGraphic{" \"$TARGET\" "}"

If you use this notation with the Lout backend to produce PostScript, this will insert the picture on every reference to the corresponding entity.

C Local Features

The following notations are defined by default:

ignore: which will just print a note, that it has been used. It's intended to get rid of the functionality implied by others.
eps: This is Encapsulated PostScript.
fig: This is to be used with figures drawn by xfig. It will automatically retrieve the referenced file and transform it into the proper format for the target.
lfig: Figures drawn using the @Fig package of Lout. See the Lout user manual[2] for a description.
roff: Preprocesses tbl and other roff code. Be sure not to exceed one page with the image you get from this notation.
latex: Pieces (especially supposed to be formulas) ofLaTeX code.
tgif: This is for figures drawn by tgif.

To use them you need to extend the head of the document like:

<!doctype document public "-//JFW//DTD Document//DE"[
<!ENTITY fig1 SYSTEM "figure1.fig" NDATA fig >
<!ENTITY tbl1 SYSTEM "table1.eps" NDATA eps >
]>

Then you can use it as described in section (5.4).

D Installation

See the file INSTALL.

E Changes

(26th June 96) arguments to rc files
(19th June 96) Version 0.7: Introduced reparsing of the document. Implementation based on streams and an recursive descendant parser.
Because the old code is left in, some files are hardly clean and easy to read.
SUBDOC entities supported. You can include complete documents within others and get them restructured as divisions of the outer document.
DTD's changed to use mixed content model. Not explicit  tags nessesary anymore.
<newpage supported within list items.
Tgif supported as notation.
Most (the computational intensive) notation handling based on Makefiles.
<quote> element has got an style attribute.
-D option added. It adds a directory at front of the path given by the environment variable DOCPATH.
The &ltc; is not needed anymore. (But still allowed)
(25th. Oct. 95)SGML_PATH is no longer used (but overwritten!). The new name for the path pointing to the document tree is DOCPATH. It is a simple colon separated path, pointing to the directories to be searched for document entities.
(25 Oct 95) Proposed tables work a little with Lout (they work as long as there are no row/column spawning cells). Unforunatly the implementation revealed limits of Lout. As there is obviously no way to implement the full semantic without changes to Lout, and the latter are not within reach.
rc files added
library is now a path

F Problems

If you get any problems don't hesitate to mail me:

<[email protected]>

Known Problems:

Ghostscript version has a problem 3.3 writing ppm files. If sdc refuses to generate gif files try ghostscript version 2.6.1.
LaTeX output dischards leading whithespaces in <verb> and <rverb> elements. I don't know how to coerce LaTeX to keep them. There is a macro \psedoverb in the converter file for LaTeX if you have an idea how to correct this macro please drop me a note.
Quote characters are internal translated into markup. They can't spawn paragraph boundaries.

H Bibliography

1
: Martin BryanSGML an authors guide; 1993 Addison Wesley Publishing Companyn
2
: Jeffrey H. KinstonA User's Guide to the Lout Document Formatting System (Version 3); 1994 Basser Department of Computer Sciencen

I Index

Appendix (o)
Bibliography (o)
Cross Reference (o), (o)
- footnote (o)
Custumization (o)
DTD (o), (o)
- book (o)
- brief (o)
- document (o)
- manpage (o)
- report (o)
Element (o)
- appendix (o)
- code, var, meta (o)
- desc (o)
- enum (o)
- figure (o)
- footnote (o)
- index (o)
- inline (o)
- item (o)
- lang (o)
- list (o)
- note (o)
- o (o)
- p (o)
- paragraph (o)
Emphasize
- documenting code (o)
- long Text (o)
- words and phrases (o)
Entities
- genereal (o)
- parameter (o)
- predefined (o), (o)
- referencing (o)
Environment (o)
- DOCPATH (o)
Equation (o)
Files (o)
- personal.data (o)
Footnotes (o)
Greek characters (o)
Indexes (o)
Invocation (o)
- options (o)
Language (o)
Notation (o)
- application (o), (o)
- application of (o)
- handling (o)
- inlined (o)
- local predefined (o)
Pictures (o)
Re-Taging (o)
SUBDOC (o)
Slides (o)
Tables (o)
conditional inclusion (o)
element
- rverb (o)
- verb (o)
foreign data (o), (o), (o)
layout (o)
local (o)
marked section (o)
newline (o)
newpage (o)
target formats (o), (o)

Jörg Wittenberger

Typeset, a short introduction

1 Overview

3.1 Environment

4.1 Document type document

4.4.2 Config

4.4.5 Return Value

4.4.8 Environment

5.1 Paragraphs

5.3 Glossare

Having pictures between the lines usually doesn't look very professional. For the effort word processors require to handle floats, it became common to do so. You should think twice whether this form is appropriate in your case.

5.6 Emphasize

This text has been enclosed by <note> and </note>.

5.9 Verbatim copied text

5.11 Linguistic Markup

5.13 Inline Code

If you use the <inline> feature, be careful about the filenames you use: sdc will eventually (due to the need of the target format) create files matching the pattern: basename-of-output-number.extension.

5.14 Tables

5.14.2 Next

It's not a big task to extend the formating rules to filter things like that through the PostScript backend and convert it into say GIF for HTML. But nobody came around to program it yet.

5.16 Foreign Languages

6 Creating Indexes

10 Personal Data

11 Appendices

12 Large documents

A General Entities

Prior version had a problem with the < Entity. There where two flavors, < and <. The problem is solved, see the changes section.

B Notations

For an extensive example see doc/nottest.sgml in the sdc distibution. This shows the ``real world'' notations provided by default by sdc.

E Changes

F Problems

H Bibliography

I Index

4.1 Document type `document`

This text has been enclosed by `<note>` and `</note>`.

If you use the `<inline>` feature, be careful about the filenames you use: sdc will eventually (due to the need of the target format) create files matching the pattern:
basename-of-output`-`number`.`extension.

Prior version had a problem with the `<` Entity. There where two flavors, `<` and `<`. The problem is solved, see the changes section.

For an extensive example see `doc/nottest.sgml` in the sdc distibution. This shows the ``real world'' notations provided by default by sdc.