SX: SP Application for SGML to XML Normalization

This is a very preliminary release of SX, an application built with the SP library for converting SGML to XML.

A Win32 executable is included. This needs the sp120u.dll file included in the SP 1.2.1 Win32 Unicode binary distribution .

To build from source, you need the SP 1.2.1 sources available through Make a subdirectory sx in the directory containing the SP sources, and unzip into that directory. With Visual C++ 4.2, use the supplied sx.mak. On Unix, do "make XPROGDIRS=sx" in the SP directory; you must leave SP_MULTI_BYTE defined in the SP Makefile.

Some documentation in HTML is in sx.htm. This should be put in the same directory as the other SP HTML documentation files. The file todo.htm contains some ideas for future improvements.

Please report any bugs to me.

James Clark


An SGML System Conforming to International Standard ISO 8879 --
Standard Generalized Markup Language


sx [ -Cehilprvx ] [ -bencoding ] [ -ccatalog_file ] [ -Ddirectory ] [ -ffile ] [ -wwarning_type ] [ -xxml_output_option ] sysid...


SX converts SGML to XML. SX parses and validates the SGML document contained in sysid... and writes an equivalent XML document to the standard output. SX will warn about SGML constructs which have no XML equivalent.

The following options are available:

Use encoding for output. By default SX uses UTF-8.
Use the catalog entry file file.
This has the same effect as in nsgmls.
Search directory for files specified in system identifiers. This has the same effect as in nsgmls.
Describe open entities in error messages.
Redirect errors to file. This is useful mainly with shells that do not support redirection of stderr.
This has the same effect as in nsgmls.
Print the version number.
Control warnings and errors according to type. This has the same effect as in nsgmls.
Control the XML output according to the value of xml_output_option as follows:
Don't use newlines inside start-tags. Usually SX uses newlines inside start-tags so as to reduce the probability of excessively long lines.
Output attribute declarations for ID attributes.
Output declarations for notations.
Output declarations for external data entities. XML requires these to be NDATA. SX will warn about CDATA and SDATA external data entities and output them as NDATA entities.
Use XML CDATA sections for CDATA marked sections and for elements with a declared content of CDATA.
Output comment declarations. Comment declarations in the DTD will not be output.
Prefer lower case. Names that were subjected to upper-case substitution by SGML will be folded to lower case. This does not include reserved names; XML requires these to be in upper-case.
Escape &<> in the contents of processing instructions using the amp, lt and gt entities. This allows processing instructions to contain the string >?, but requires that applications handle the escapes.
Use the <e/> syntax for element types e declared as EMPTY.
Output a ATTLIST declaration for every element specifying the type of all attributes. The default will always be #IMPLIED.

Multiple -x options are allowed.

TODO: SX - Possible improvements

Option to use empty element syntax for contingently empty elements.

Check for ENTITY attributes whose value is internal CDATA/SDATA entity.

Option to generate IDREF attribute declarations.

Option to generate ENTITY attribute declarations.

Option to generate NOTATION attribute declarations.

Warn about characters unrepresentable in output encoding specified with -b occurring in contexts where numeric character references are not allowed.

Check for numeric character references to non-SGML characters.

Check for comments containing --.

Option to put link attributes on start-tags.

Check for invalid name characters.

Option to copy external entities to current directory.

Try to turn absolute filenames into relative filenames.

Allow SDATA names to be mapped to Unicode.

Output declarations in sorted order.

Option to use attribute defaulting.

Option to output reference to external DTD.

Option to use case that was used in the DTD for element type names, attribute names, notation names, enumerated values.

Option to preserve white-space in element content.

Handle SUBDOC entities by converting to NDATA with notation SGML. At least warn about them with -ondata.

Be able to preserve external entities. Provide a -oentity_name=file option that preserves references to entity_name and writes the normalized entity to file.

James Clark