|Last modified: March 25, 2000|
|SGML/XML and (La)TeX|
The following note was originally written in mid-1992 as a boilerplate document to help answer the question "What's the relationship between SGML and TeX (LaTeX)." It needs updating, as there are now many more articles relevant to SGML and (La)TeX as complimentary technologies; see TUGboat recent issues, for example. Until the following is updated (Nelson B.?), it still may be useful here as is. -- Robin Cover
On the Relationship between SGML and TeX/LaTeX
The similarity of SGML to LaTeX has frequently led to discussion on the networks and in published literature about the exact (future) relationship between the SGML, TeX and LaTeX. More than once we have heard this comment about SGML: "Well, SGML is OK, but why can't we just do it all in LaTeX or LaTeX3?" The answer: "You can. You can also drive screws with a wood chisel, and you can chisel wood with a screwdriver." [credits Steve DeRose, if memory serves]. TeX is for putting ink on paper, and SGML is not. SGML is for defining markup languages and for validating the structure of electronic information against a formal grammar; TeX is not. LaTeX (apparently more or less a rip-off of Scribe, to the extent I have investigated it) is a little different story: it helps put ink on paper using macros that can be designed as descriptive markup notations. One could re-implement something like SGML in LaTeX, but I can't think of any sane reason to do so.
Fortunately, a peaceful and "probably simbiotic" co-existence of SGML (La)TeX is now generally agreed upon by most TeX/LaTeX users. LaTeX and TeX can readily be used for formatting/typesetting documents structured with SGML defined languages, while they are not themselves suitable as formal languages for structuring information in the same manner (formal rigor and versatility) as SGML.
- [February 19, 1998] Under the LaTeX3 Project, several facilities are being designed and developed to directly support the processing of SGML/XML-encoded documents through LaTeX. The relevant features of LaTeX are summarized in an article by Frank Mittelbach and Chris A. Rowley: "The LaTeX3 Project," TUGboat: The Communications of the TEX Users Group [Proceedings of the 1997 Annual Meeting] Volume 18, Number 3 (September 1997) 195-198. See the full text of the "Description" from the article for details. A summary, according to this article, tells us that requirements being fulfilled in this effort include:
- Provision of a syntax that allows highly automated translation from popular SGML DTDs into LaTeX document classes; to be provided as standard with the new version of LaTeX
- Support for the SGML concepts of 'entity,' 'attribute,' and 'short reference' in the syntax of the new LaTeX user interface, implemented in a way that makes it possible to map these constructs directly to the corresponding SGML features
- Support for hyperlinks of the kind used in HTML and XML, and support for other features of online documents
- Straight forward style-designer interface to support the independent specification of typographic requirements and their mapping to SGML constructs in document instances, so that different layouts may be specified for the same DTD
- Visual, menu-driven interface for typographic style-design interface
- Support for the DSSSL specification in the interface, and for HTML/XML stylesheets
- Jadetex Package, from Sebastian Rahtz and David Megginson. The Jadetex package, built on top of LaTeX, is an implementation of the TeX skeleton produced by running
jade -t tex". Jade is a DSSSL Engine from James Clark, and it has a TeX backend which generates TeX from the SGML source and the DSSSL stylesheet.
- format: Thomas Gordon's QWERTZ SGML -> LaTeX formatting package
- gf: Gary Houston's general formatter program
- Jörg Wittenberger's Typeset Package
- Jörg Wittenberger's SDC Package
- SGML2TeX - SGML-to-TeX converter
- tei2latex - TEILITE to LaTeX2e
- Information from the TeX FAQ
- [February 28, 2000] Passive TeX. 'Passive TeX' - Using TeX to format XSL Formatting Objects. "The system works in two stages: (1) apply an XSL stylesheet to the XML document and generate a new XML file containing XSL
<fo:...markup, (2) process that file with a special TeX (pdfTeX) format to generate a formatted PDF file. 'PassiveTeX' is TeX package used to directly format XSL FO material. The files in the distribution "form a demonstration of LaTeX reading XSL Formatting Objects and processing them to produce nice pages. . . using the XML version of the TEI Lite guidelines, we apply the XSL stylesheet and run it through James Clarks' XT XSL processor [producing the flow objects], then through 'pdftex' to produce the PDF file."
- [February 28, 2000] "xmltex: A non validating (and not 100% conforming) namespace aware XML parser implemented in TeX." From David Carlisle, Numerical Algorithms Group: NAG. "xmltex implements a non validating parser for documents matching the W3C XML Namespaces Recommendation. The system may just be used to parse the file (expanding entity references and normalising namespace declarations) in which case it records a trace of the parse on the terminal. Normally however the information from the parse is used to trigger TeX typesetting code. Declarations (in TeX syntax) are provided as part of xmltex to associate TeX code with the start and end of each XML element, attributes, processing instructions, and with unicode character data." [manual, local archive copy]
- [February 28, 2000] TeXML. TeXML provides a path from XML into the TeX formatting language. The path to print begins with your XML document. You write an XSL transform which accepts your document type and outputs a new XML document which conforms to the TeXML document type. The java program, TeXMLatte transforms any document conforming to the TeXML document type into TeX. The 06/23/99 version contains examples for the April 21 XSL working draft; it is updated for Lotus XSL 0.17.2 and XML4J 2.0.11. TeXMLatte now outputs options and parms in document order."
- TeXML from IBM alphaWorks. "There are three parts to the TeXML solution: (1) an XML document type definition; (2) TeXML.java, a program written in java which takes as input a DOM conforming to TeXML.dtd and outputs TeX; (3) TeXMLatte.java, a program written in java which takes as input an XML document conforming to TeXML.dtd and outputs TeX. The program parses the document using the XML4J parser. It sends the resulting DOM to a method the TeXML class."
- [March 31, 1999] Generic TeX DTD By Oliver Zeigermann: "I have designed a DTD to represent TeX in XML. Inspired by TeXML I found at IBM's alphaworks. My DTD shares some ideas but tries to make information a bit more explicit. This seemed to be necessary as I also want to use it as an intermediate step in converting TeX to XML." download.
- WG-94-09 (TUG Technical Working Group). TeX and SGML: "The major objective is to investigate the requirements and difficulties in developing an interface technology for TeX and SGML. Contact: Ken Dreyhaupt, email: firstname.lastname@example.org
- [April 01, 1999] "Active TEX and the DOT Input Syntax." By Jonathan Fine. To be presented at TUG '99, on Wednesday, August 18, 1999 in the session 'TeX in Publishing'. "The usual category codes give TEX its familiar backslash and braces input syntax. With Active TEX, all characters are active. This gives the macro programmer complete freedom in defining the input syntax. It also provides a powerful programming environment. The dot input syntax, like TROFF, uses a period at the start of the line as an escape character. However, its underlying element, attribute and content structure is based on SGML. It is both easy to use and easy to program for. Conversion to other formats, such as SGML, HTML and XML, or to proprietary formats such as Word and RTF, will be straightforward. This is because the DOT syntax is rigorous. This new syntax will be described and demonstrated. All manner of problems connected with TEX disappear when Active TEX packages are used. For example, all input errors can be detected and corrected before they cause a TEX error message. This will make TEX accessible to many more users." Note: This document [also in HTML] is a preliminary version of a paper to be presented to the 20th Annual Meeting of the TEX Users Group (Vancouver, Canada, 15-19 August 1999). Jonathan Fine wrote similarly on CTX (1999-04-01): "Active TeX. I've written a TeX macro package that makes all characters active. With Active TeX, every character is a macro! Believe it or not, many problems with TeX can as a result be solved. For more information visit http://www.active-tex.demon.co.uk/."
- Interesting TeX-related URLs
Here follows a sampling of articles on the SGML-TeX/LaTeX relationship, in approximate chronological order:
- Brüggemann-Klein, A.; Dolland, P.; Heinz, P. "How to Please Authors and Publishers: A Versatile Document Preparation System at Karlsruhe." Pp. 9-31 in TEX for Scientific Documentation. Second European Conference [Strasbourg, France, June 19-21 1986] Proceedings. Ed. Jacques Désarménien. Lecture Notes in Computer Science, 236. Berlin/Heidelberg/New York: Springer Verlag, 1986.
- Price, Lynne A. "SGML and TEX." TUGBoat: The TEX Users Group Newsletter 8/2 (July 1987) 221-225.
- Price, Lynne A. "A Note Comparing SGML to Text Processing Macro Languages." SGML Users' Group Bulletin 2/2 (1987) 127.
- Clark, Malcolm. "A Note Comparing TeX to SGML." SGML Users' Group Bulletin 3/2 (1988) 67-68, = Response to the article of Lynn Price.
- Price, Lynne A. "Using SGML and TEX for User Documentation." TEXniques No. 7: Proceedings, TEX User's Group 1988 Annual Meeting [21-24 August 1988, Montreal], 203-210.
- Slocombe, David. "SGML: A Different Kind of Markup (Was: Why learn Tex?)." Submission to USENET News forum "comp.text," March 3, 1990. An informative short history of SGML and discussion of its features vis-à-vis TeX.
- Hickey, Thomas B. "Using SGML and TeX for an Interactive Chemical Encyclopaedia." Pp. 187-195 in National Online Meeting Proceedings of the Tenth National Online Meeting [9-11 May 1989 New York, NY]. Medford, NJ: Learned Information, 1989.
- Laan, C. G. (Kees). "SGML (,TeX and . . .)," TUGboat 12/1 (March 1991 = Proceedings of TeX90) 90-104.
- Poppelier, Nico A. F. M. "SGML and TeX in Scientific Publishing," TUGboat 12/1 (March 1991 = Proceedings of TeX90) 105-109.
- Sperberg-McQueen, C. Michael. "Specifying Document Structure: Differences in LaTeX and TEI Markup," TUGboat 12/3 = Proceedings of the 1991 Annual Meeting) 415-421 (available similarly as a TEI document TEI EDW22, June 9, 1991).
- Dobrowolski, Andrew E. "Typesetting SGML Documents Using TeX," TUGboat 12/3 = Proceedings of the 1991 Annual Meeting) 409-414.
- McGaffey, Robert W. "SGML versus/and TeX," TUGboat 12/3 = Proceedings of the 1991 Annual Meeting) 406-408.
- Dobrowolski, Andrew E., "Typesetting SGML documents Using TeX," Cahiers GUTenberg 10-11 (septembre 1991) 185-196.
- Wonneberger, Reinhard. "Approaching SGML from TeX" TUGboat 13/2 (July 1992) 226-227.
- Wonneberger, Reinhard; Mittelbach, Frank. "SGML -- Questions and Answers." TUGboat 13/2 (July 1992) 221-223.
- Flynn, Peter. "TeX and SGML: A Recipe for Disaster?" TUGboat 14/3 (1993) [Proceedings of the 1993 Annual Meeting] 227-230. See the main bibliographic entry.
- Reinhard Wonneberger. "Tex in an Industrial Environment." Electronic Publishing: Origination, Dissemination and Design (EPODD) 7/1 (March 1994) 3-19, with 80 references. See bibliographic entry.
- Martin Key. "Theory and Practice: Working with SGML, PDF and LATEX at Elsevier Science." Baskerville 5/2 (March 1995). See the main bibliographic entry for other articles on SGML in this issue of Baskerville.
- Sebastian Rahtz. "Another Look at LATEX to SGML Conversion." TUGboat: The Communications of the TEX Users Group [issue = Proceedings of the 1995 Annual Meeting] 16/3 (September 1995) 315-324. ISSN: 0896-3207. See the main bibliographic entry.
- Thomas F. Gordon. "The QWERTZ Synthesis of SGML and LATEX." Computer Standards and Interfaces 17/1 (January 1995) 25-33. See the bibliography entry and the database entry in the software section.
- [Other gaps for 1996?]
- Frank Mittelbach and Chris A. Rowley. "The LaTeX3 Project." TUGboat: The Communications of the TEX Users Group [Proceedings of the 1997 Annual Meeting] Volume 18, Number 3 (September 1997) 195-198. Several facilities are being designed and developed to directly support the processing of SGML/XML-encoded documents through LaTeX. See the full text of the "Description" from the article for details, and the full bibliographic entry. See also above.
- Frank Mittelbach and Chris Rowley. "Language Information in Structured Documents: Markup and Rendering -- Concepts and Problems." TUGboat: The Communications of the TEX Users Group [issue = Proceedings of the 1997 Annual Meeting] 18/3 (September 1997) 199-205. ISSN: 0896-3207. See the main bibliographic entry. Compare the LaTeX(3) solution to that in XML 1.0 (February 1998): "In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named
xml:langmay be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by [IETF RFC 1766], Tags for the Identification of Languages. . . The intent declared with
xml:langis considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of
xml:langon another element within that content."
- Christopher B. Hamlin. "From SGML to HTML with Help from TeX." TUGboat: The Communications of the TEX Users Group [issue = Proceedings of the 1997 Annual Meeting] 18/3 (September 1997) 170-174. See the main bibliographic entry.
- [March 24, 2000] "TeX and XML-related news." By Michel Goossens, IT/ASD. "Xpath and XSLT together are powerful tools to build XML to HTML and TeX converters (in fact XSLT allows you to transform XML sources into a whole set of target formats). As explained in the Letter from the Editor part at the beginning of this CNL, we have used this technology to produce the present CNL by writing two stylesheets, one to translate the XHTML to HTML (in fact the tricky part is getting the tree structure of documents on the Web in place) and one for going from XHTML to LaTeX, which is much more complicated, since the XHTML model maps very poorly onto LaTeX . This is especially true if one has to translate pages optimized for viewing on the Web (i.e., using color, visual effects, forms, etc.), which have no equivalent in LaTeX , that emphasizes structure rather than visual appearance. For those interested in how this technique can be used, the following two URLs provide more information: (1) Tutorial presented at the UKTUG Conference (Oxford University, 12-13 September 1999): "XML, XSL, two of a family of extensible languages" (http://wwwinfo.cern.ch/asdoc/WWW/publications/oxford99/oxford99main.html). (2) Presentation of PassiveTeX at the XML Developers' Conference (Montreal, 19-20 August 1999): "PassiveTeX: XML and TeX , doing it together..." (http://wwwinfo.cern.ch/asdoc/WWW/publications/xmldev99/passivetex.html). A series of lectures on XML (not only in the text-processing area, but also for databases, visualization, etc.) is planned for the third term (April-June 2000) of the Academic Training..."
- [January 27, 2000] ""TEXML: Typesetting XML with TEX." By Doug Lovell (IBM Research, New York). In TUGboat: The Communications of the TeX Users Group Volume 20, Number 3 (September 1999), pages 176-183. Paper presented at TUG '99 The 20th Annual Meeting of the TeX Users Group. August 15-19, 1999. University of British Columbia, Vancouver, BC Canada. ["TEX Online: Untangling the Web and TEX."] "XML, eXtensible Markup Language, is a simplified subset of SGML, which is fast becoming a standard for content management on the internet. TEXML is an XML vocabulary for TEX. A processor written in JAVA translates TEXML-conforming XML into TEX. The processor provides a document formatting solution for XML that leverages the rich knowledge and capability built over many years in TEX. The presentation describes the TEXML document format and the processor, TEXMLatte, that produces TEX source from TEXML markup." See TeXML at the IBM Web site.
It may also be noted that ArborText Inc (among many other companies) supports a suite of SGML tools compatible with or built around LaTeX/TeX, which are used for formatting equations and other complex data. ArborText's address is: 1000 Victors Way, Ann Arbor, MI 48108-2700 USA; TEL: 1-313-996-3566, FAX: 1-313-996-3573; WWW: http://www.arbortext.com/; Email: email@example.com.
The Computing Center at the University of Groningen offers document preparation services for academic disciplines, and in this connection supports the integrated use of LateX and SGML. In August 1990, for example, in cooperation with the Dutch SGML Users' Group, the University's RekenCentrum hosted an "SGML & TeX Conference," offering 3-day courses on (La)TeX, SGML and the integration of these systems. Course instructors were Kees van der Laan, Jan Bleeker and Jan Maasdam. Several publications have been issued by members of the University of Groningen on the use of SGML and TeX for mathematical tables and related topics. Contact the Netherlands TeX Group [ ] or Kees van der Laan (RekenCentrum RijksUniversiteit Groningen; Attention: Kees C.G. van der Laan; Landleven 1; NL-9700 AV; Groningen, THE NETHERLANDS; Internet Email firstname.lastname@example.org, BITNET email CGL@HGRRUG5.BITNET; Tel: 050-633374).
Address for the TeX Users Group:
TeX Users Group
1850 Union Street Suite 1637
San Francisco, California 94123 USA
Tel: [+1] (805) 963 1338
FAX: [+1] 415-982-8559
Jacques Andre email: email@example.com IRISA/INRIA-Rennes tel: (33) 99 84 73 50 Campus de Beaulieu FAX: (33) 99 38 38 32 F-35042 Rennes cedex, France Or: Jacques.Andre@irisa.fr (Jacques Andre) [address from 1992]