[Mirrored from: http://mish161.cern.ch/sc4wg6/math/dli.htm]

Notes on SGML Math Workshop

Held as part of the University of Illinois Digital Library Initiative (UIUC DLI)

University of Illinois at Urbana-Champaign

May 1, 1996
Written up by Evan Owens, Electronic Publishing Manager, Journals Division,University of Chicago Press

The workshop was very well attended; discussion was lively. The plan of the meeting was presentations in the morning and early afternoon followed by an extended period of group discussion.

As I understand it, the reason for calling the workshop was that the UIUC Digital Library Project was having difficulty getting the SGML Math supplied by the publisher partners to be rendered effectively on screen by SoftQuad's Panorama. So the immediate goal was to solve that problem. The University of Chicago Press is not a participant in UIUC DLI project, so it is possible that I may have misstated some of the background.

PART 1 - PRESENTATIONS

(1) Stephen Wolfram, inventor of Mathematica

The relevance of Mathematica to SGML Math becomes apparent in the later discussion. Wolfram hosted lunch and dinner for the attendees.

Wolfram Research has spent 5 years and millions of dollars to develop typesetting capabilities of Mathematica. They believe that it can now typeset math to the level of the best commercial system; its page layout capabilities are comparable to WordPerfect or Word and not quite as good as Pagemaker. Unlike TeX, Mathematica can gracefully break formulas into lines without operator intervention because it understands the structure of the math; if it didn't understand the structure, it couldn't calculate. Also it knows how tightly the operators are bound.

Input is palette, keyboard, and strings; has full internal Unicode support. Mathematica has an open architecture for customizing. All formatting styles are pairs: screen and paper style. Uses monospaced fonts on screen for legibility.

To interpret math, it needs to know unambiguously the role of every element in the formula. E.g., is "e" an exponential constant or a variable named "e". Mathematica keeps additional information in its internal representation of the math.

Traditional text book math cannot always be made unambiguous. Mathematic converts from traditional form notation to its internal format using a large collection of heuristic rules; this doesn't always work. When editing in traditional notation, one can easily make something that can't be converted back; their editing environment will prompt about this. It can handle also handle abstract notation, though it needs to given processing rules.

Inside Mathematica, everything is a symbolic expression. Traditional form math is a set of transformation rules.

Mathematica's spacing and display is much better than TeX's. It can export math to gif, eps, pict, TeX, HTML, speech, or to ASCII that can be reloaded into Mathematica. Export to/from TeX is done using transformation rules. They have a version that takes math input, sends it to a math server that renders it and returns GIFs; also an inline app to render math into ActiveX and Netscape inline addins.

Wolfram Research would like to work with SGML Math community. Mathematica notebooks are are markup language; one could map the structure of the entire notebook (not just the maths) into an SGML DTD.

(2) Dave Raggett, WWW Consortium

From other comments later, it appears that the WWW Consortium math committee is split and that Dave Raggett's ideas are just one proposal under consideration.

Raggett:

Web vendors not interested in Math

Special review board of Wolfram, Adobe, AMS trying to develop open spec.

His proposal has nothing to do with SGML at all. (My question to him later was, was it fair statement to say that this proposal is to punt, to take math entirely out of HTML; his answer was yes).

His plan is to use SGML notations and allow documents to specify the server (URL) where a set of rules (knowlege base) would be available to render the math. The actual HTML would be very simple: perhaps only three tags: inline formula, display formula, and a tag to identify where the render was to be found. Then, inside the tag, any kind of notation could be used. His demonstration used prolog.

This proposal was not well received by the audience; comments later included, how many years will it take until this is working; and isn't this just avoiding the issue?

(3) Roy Pike, Semantic Math

He presented his paper; text is available elsewhere so I won't repeat it here. He came down hard in support of semantic math as the solution to all problems.

Comments from the audience seemed to be that this was a good long term goal but didn't solve the problem at hand: rendering math for screen display.

It was apparent that semantic math and Mathematica had a lot in common; Pike had been to visit Wolfram and has apparently seen their common interest: a robust semantic math would be a format that Mathematica could easily export to.

Another important comment (from me and others) was that it is unrealistic to expect that we are going to be able to afford to disambiguate math in a production environment; if we can't get this from the authors then it isn't going to work.

Pike seems to favor working on the assumption of a perfect world; unfortunately most of us in publishing living in very imperfect worlds.

(4) Publisher Perspectives: Evan Owens, UCP/AAS

I spoke very briefly on what I see as the important issues:

SGML math won't work unless we have tools to get the math from the authors and to edit it robustly. We need authoring and editing environments or appropriate translation tools.

I also describe our project and how we convert math from LaTeX to SGML to typesetting systems and back.

(5) Publisher Perspectives: AIP

Tim Inglesby and Scott Johnson of the AIP spoke at some length about their work with SGML and ISO 12083 math.

(6) Paul Grosso, ArborText

Paul proposed that the semantic layer be applied on top of the current ISO 12083 DTD through attributes; I asked whether this would be comparable to the ICADD stuff; he said that he was not specifically thinking of architectural forms but that was a possibility.

There were some reservations expressed about this from the DLI team in that their search engine searches on tags not attributes; I pointed out that one could easily generate special output for searching and that there were advantages to do so.

Paul talked about how SGML Open had dealt with tables; he proposed that SGML Open be used as a forum for working out a minimum subset of math that all vendors would support.

(7) Paul Topping, Design Science

Their product is MathType, an equation editor that is used in MS-Word and other products, including Corel Ventura. Their new version in Unicode based and will support drop in translators. They are currently implementing the translator architecture and have not started working on an ISO 12083 to Mathtype and return translator.

(8) Murray Malone, Panorama

Didn't say much except if you identify the problems, we'll fix them. Said a lot more later in the discussion.

Part II. Discussions

The scheduled discussion period turned into a real free-for-all; this summary will reflect the confused state of the discussion. I've grouped some of the discussion logically rather than chronologically to help sort it out.

Paul Grosso (ArborText) was not present for this discussion, alas.

It was clear that we were going to have some major differences of opinion. To break the ice, we started with a somewhat less controversial topic.

(1) Searching Mathematics

It was argued that complex searching of mathematics was comparable to the kind of searching of chemistry that is done in Chem Abstracts. That kind of searching depends on knowledge that would only be available in semantic math (or Mathematica). This was seen as an argument in favor of Roy Pike's proposal.

Side issue: it was argued that it will be unicode and not SGML entities that will make complex searching possible. (Mathematica is already unicode based.)

(2) Legibility of Screen Math

Long discussion of the inherent problems of displaying math or other complex text on screens. Mathematica defended their mono-space math screen fonts. AIP argued that print quality was necessary and that Panorama was totally unacceptable.

One of the DLI people had written a dissertation on the screen display of maths.

It was pointed out that horizontal scrolling is very bad; ditto zooming in and out; that text that wraps to the width of the screen is highly desirable. This entire discussion seem obvious to me; I've said from day one that if we were designing a journal for screen reading, it wouldn't have two narrow columns.

This is a major strength of Mathematica, that it can robustly re-break equations to various screen widths!

(3) Short Term versus Long Term

There was discussion of some immediate solutions such as using applets: Java, OLE, ActiveX or working out the bugs in Panorama, versus the long term solution of semantic math. AIP made the point that they (and others) have to have a solution now; they have math intensive journals that need to go on line.

Java Applets apparently don't pass font information back to the viewer (e.g., baseline) so they don't work well for inline math. Work on style sheets for HTML might solve this eventually.

(4) Mathematica's Offer

It was proposed that it would be possible to use the plug-in reader version of Mathematica to render math. Mathematicas format is published but proprietary. Stephen Wolfram offered to allow an ISO standard that would closely correspond to their work on the semantics of math. Wolfram Research would agree not to sue.

Mathematica claims that it doesn't take a lot of intervention to move from traditional math to semantic; their heuristic rules help considerably with this.

Mathematica has taken a journal produced in TeX, Complex Systems, and converted it entirely to Mathematica notebooks with active math.

At one point in the discussion, Wolfram offered to just the solve the problem for us, to do the development on a semantic math DTD that would map in and out of Mathematica's internal format. The people from SoftQuad didn't like this at all, as this would effectively cut them out of the market.

COMMENTARY: of course, that is going to happen anyway as active math is going to be much more desirable than static math.

(5) Who's doing what with Math

The room was polled as to who is doing what with math of the organizations present:

IEE is using embedded TeX
AMS is entirely TeX
AAS (us) is SGML math, but AAP DTD
Beacon Graphics (various projects) uses AAP math DTD
AIP is using ISO 12083 math post-typesetting

(6) SGML OPEN Model

Murry Malone (SoftQuad) proposed that the model used for tables by the SGML OPEN be followed: that the vendors get together and agree on a subset of functionality that everyone will support. The implication was that this would be visual math with semantic as the optional layer.

(7) DECISIONS AND PLANS

At the end of the discussion it was proposed that the short term goal be better implementation of the current ISO 12083 math and that the long term goal be semantic math.

(7A) Short Term

It was proposed that the NSF fund another workshop, this one on ISO 12083 Math implementation, that we get together and work up documentation on proper coding practices. Murray Malone offered the SGML OPEN summer meeting in Montreal as a venue for such a meeting. There is a risk that such a meeting will degenerate into an argument with the AIP about how to code ISO 12083 math, since they are the only people doing at the moment. But it might be useful.

(7B) Long Term

There was discussion about whether the meeting should vote to support Pike's proposal. The eventually consensus was to not vote one way or another as the proposal was too new.

(8) Comments and Observations

Odd meeting in various ways:

strong influence of Wolfram Research (lunch, dinner, and lots of bodies)
lack of preparation/communication between Softquad, the DLI project, and AIP
large turnout shows real interest in topic, but very few people seem willing to actually do SGML math

My personal opinion is that the future of SGML math rests entirely on the tool makers. Right now, the only serious SGML math (in the USA) is being done using the AAP DTD because that is what ArborText has implemented. Without tools to create, edit, and present SGML math no one is going to bother with it. AIP even said that they had to write a tool to convert ISO 12083 math to TeX so that they could render it and see if they are coding it correctly. My guess is that Wolfram will decide that an SGML input output filter isn't any harder than their existing TeX filter and they will move ahead. That will shape the future discussion; as has been amply demonstrated on the internet "working code wins" and not necessarily international standards.

Click here to post a comment about this document.