SX: SP Application for SGML to XML Normalization

[Documentation below extracted from the package. See the Source (.ZIP), or the database entry: SX - An SP application for SGML to normalized XML -rcc]

This is a very preliminary release of SX, an application built with the SP library for converting SGML to XML.

A Win32 executable is included. This needs the sp120u.dll file included in the SP 1.2.1 Win32 Unicode binary distribution .

To build from source, you need the SP 1.2.1 sources available through http://www.jclark.com/sp/howtoget.htm. Make a subdirectory sx in the directory containing the SP sources, and unzip into that directory. With Visual C++ 4.2, use the supplied sx.mak. On Unix, do "make XPROGDIRS=sx" in the SP directory; you must leave SP_MULTI_BYTE defined in the SP Makefile.

Some documentation in HTML is in sx.htm. This should be put in the same directory as the other SP HTML documentation files. The file todo.htm contains some ideas for future improvements.

Please report any bugs to me.

James Clark
jjc@jclark.com

SX

An SGML System Conforming to International Standard ISO 8879 --
Standard Generalized Markup Language

SYNOPSIS

sx [ -Cehilprvx ] [ -bencoding ] [ -ccatalog_file ] [ -Ddirectory ] [ -ffile ] [ -wwarning_type ] [ -xxml_output_option ] sysid...

DESCRIPTION

SX converts SGML to XML. SX parses and validates the SGML document contained in sysid... and writes an equivalent XML document to the standard output. SX will warn about SGML constructs which have no XML equivalent.

The following options are available:

-bencoding
Use encoding for output. By default SX uses UTF-8.
-cfile
Use the catalog entry file file.
-C
This has the same effect as in nsgmls.
-Ddirectory
Search directory for files specified in system identifiers. This has the same effect as in nsgmls.
-e
Describe open entities in error messages.
-ffile
Redirect errors to file. This is useful mainly with shells that do not support redirection of stderr.
-iname
This has the same effect as in nsgmls.
-v
Print the version number.
-wtype
Control warnings and errors according to type. This has the same effect as in nsgmls.
-xxml_output_option
Control the XML output according to the value of xml_output_option as follows:
no-nl-in-tag
Don't use newlines inside start-tags. Usually SX uses newlines inside start-tags so as to reduce the probability of excessively long lines.
id
Output attribute declarations for ID attributes.
notation
Output declarations for notations.
ndata
Output declarations for external data entities. XML requires these to be NDATA. SX will warn about CDATA and SDATA external data entities and output them as NDATA entities.
cdata
Use XML CDATA sections for CDATA marked sections and for elements with a declared content of CDATA.
comment
Output comment declarations. Comment declarations in the DTD will not be output.
lower
Prefer lower case. Names that were subjected to upper-case substitution by SGML will be folded to lower case. This does not include reserved names; XML requires these to be in upper-case.
pi-escape
Escape &<> in the contents of processing instructions using the amp, lt and gt entities. This allows processing instructions to contain the string >?, but requires that applications handle the escapes.
empty
Use the <e/> syntax for element types e declared as EMPTY.
attlist
Output a ATTLIST declaration for every element specifying the type of all attributes. The default will always be #IMPLIED.

Multiple -x options are allowed.


TODO: SX - Possible improvements

Option to use empty element syntax for contingently empty elements.

Check for ENTITY attributes whose value is internal CDATA/SDATA entity.

Option to generate IDREF attribute declarations.

Option to generate ENTITY attribute declarations.

Option to generate NOTATION attribute declarations.

Warn about characters unrepresentable in output encoding specified with -b occurring in contexts where numeric character references are not allowed.

Check for numeric character references to non-SGML characters.

Check for comments containing --.

Option to put link attributes on start-tags.

Check for invalid name characters.

Option to copy external entities to current directory.

Try to turn absolute filenames into relative filenames.

Allow SDATA names to be mapped to Unicode.

Output declarations in sorted order.

Option to use attribute defaulting.

Option to output reference to external DTD.

Option to use case that was used in the DTD for element type names, attribute names, notation names, enumerated values.

Option to preserve white-space in element content.

Handle SUBDOC entities by converting to NDATA with notation SGML. At least warn about them with -ondata.

Be able to preserve external entities. Provide a -oentity_name=file option that preserves references to entity_name and writes the normalized entity to file.

James Clark
jjc@jclark.com