[Mirrored from: http://mish161.cern.ch/sc4wg6/math/dli.htm]
The workshop was very well attended; discussion was lively. The plan of the meeting was presentations in the morning and early afternoon followed by an extended period of group discussion.
As I understand it, the reason for calling the workshop was that the UIUC Digital Library Project was having difficulty getting the SGML Math supplied by the publisher partners to be rendered effectively on screen by SoftQuad's Panorama. So the immediate goal was to solve that problem. The University of Chicago Press is not a participant in UIUC DLI project, so it is possible that I may have misstated some of the background.
Wolfram Research has spent 5 years and millions of dollars to develop typesetting capabilities of Mathematica. They believe that it can now typeset math to the level of the best commercial system; its page layout capabilities are comparable to WordPerfect or Word and not quite as good as Pagemaker. Unlike TeX, Mathematica can gracefully break formulas into lines without operator intervention because it understands the structure of the math; if it didn't understand the structure, it couldn't calculate. Also it knows how tightly the operators are bound.
Input is palette, keyboard, and strings; has full internal Unicode support. Mathematica has an open architecture for customizing. All formatting styles are pairs: screen and paper style. Uses monospaced fonts on screen for legibility.
To interpret math, it needs to know unambiguously the role of every element in the formula. E.g., is "e" an exponential constant or a variable named "e". Mathematica keeps additional information in its internal representation of the math.
Traditional text book math cannot always be made unambiguous. Mathematic converts from traditional form notation to its internal format using a large collection of heuristic rules; this doesn't always work. When editing in traditional notation, one can easily make something that can't be converted back; their editing environment will prompt about this. It can handle also handle abstract notation, though it needs to given processing rules.
Inside Mathematica, everything is a symbolic expression. Traditional form math is a set of transformation rules.
Mathematica's spacing and display is much better than TeX's. It can export math to gif, eps, pict, TeX, HTML, speech, or to ASCII that can be reloaded into Mathematica. Export to/from TeX is done using transformation rules. They have a version that takes math input, sends it to a math server that renders it and returns GIFs; also an inline app to render math into ActiveX and Netscape inline addins.
Wolfram Research would like to work with SGML Math community. Mathematica notebooks are are markup language; one could map the structure of the entire notebook (not just the maths) into an SGML DTD.
Raggett:
Web vendors not interested in Math
Special review board of Wolfram, Adobe, AMS trying to develop open spec.
His proposal has nothing to do with SGML at all. (My question to him later was, was it fair statement to say that this proposal is to punt, to take math entirely out of HTML; his answer was yes).
His plan is to use SGML notations and allow documents to specify the server (URL) where a set of rules (knowlege base) would be available to render the math. The actual HTML would be very simple: perhaps only three tags: inline formula, display formula, and a tag to identify where the render was to be found. Then, inside the tag, any kind of notation could be used. His demonstration used prolog.
This proposal was not well received by the audience; comments later included, how many years will it take until this is working; and isn't this just avoiding the issue?
Comments from the audience seemed to be that this was a good long term goal but didn't solve the problem at hand: rendering math for screen display.
It was apparent that semantic math and Mathematica had a lot in common; Pike had been to visit Wolfram and has apparently seen their common interest: a robust semantic math would be a format that Mathematica could easily export to.
Another important comment (from me and others) was that it is unrealistic to expect that we are going to be able to afford to disambiguate math in a production environment; if we can't get this from the authors then it isn't going to work.
Pike seems to favor working on the assumption of a perfect world; unfortunately most of us in publishing living in very imperfect worlds.
SGML math won't work unless we have tools to get the math from the authors and to edit it robustly. We need authoring and editing environments or appropriate translation tools.
I also describe our project and how we convert math from LaTeX to SGML to typesetting systems and back.
There were some reservations expressed about this from the DLI team in that their search engine searches on tags not attributes; I pointed out that one could easily generate special output for searching and that there were advantages to do so.
Paul talked about how SGML Open had dealt with tables; he proposed that SGML Open be used as a forum for working out a minimum subset of math that all vendors would support.
Paul Grosso (ArborText) was not present for this discussion, alas.
It was clear that we were going to have some major differences of opinion. To break the ice, we started with a somewhat less controversial topic.
Side issue: it was argued that it will be unicode and not SGML entities that will make complex searching possible. (Mathematica is already unicode based.)
One of the DLI people had written a dissertation on the screen display of maths.
It was pointed out that horizontal scrolling is very bad; ditto zooming in and out; that text that wraps to the width of the screen is highly desirable. This entire discussion seem obvious to me; I've said from day one that if we were designing a journal for screen reading, it wouldn't have two narrow columns.
This is a major strength of Mathematica, that it can robustly re-break equations to various screen widths!
Java Applets apparently don't pass font information back to the viewer (e.g., baseline) so they don't work well for inline math. Work on style sheets for HTML might solve this eventually.
Mathematica claims that it doesn't take a lot of intervention to move from traditional math to semantic; their heuristic rules help considerably with this.
Mathematica has taken a journal produced in TeX, Complex Systems, and converted it entirely to Mathematica notebooks with active math.
At one point in the discussion, Wolfram offered to just the solve the problem for us, to do the development on a semantic math DTD that would map in and out of Mathematica's internal format. The people from SoftQuad didn't like this at all, as this would effectively cut them out of the market.
COMMENTARY: of course, that is going to happen anyway as active math is going to be much more desirable than static math.
My personal opinion is that the future of SGML math rests entirely on the tool makers. Right now, the only serious SGML math (in the USA) is being done using the AAP DTD because that is what ArborText has implemented. Without tools to create, edit, and present SGML math no one is going to bother with it. AIP even said that they had to write a tool to convert ISO 12083 math to TeX so that they could render it and see if they are coding it correctly. My guess is that Wolfram will decide that an SGML input output filter isn't any harder than their existing TeX filter and they will move ahead. That will shape the future discussion; as has been amply demonstrated on the internet "working code wins" and not necessarily international standards.
Click here to post a comment about this document.