[Mirrored from: http://mish161.cern.ch/sc4wg6/math/laloe.htm]

On using SGML for mathematical formulas

As a result of the email I sent to a number of people announcing this web server, I had an email discussion (in French) with Prof. Franck Laloe, Theoretical Physicist at the Ecole Normale Superieure, Paris. Franck is also the chairman of the publications committee of the European Physical Society. In this discussion a number of points came up which I believe are of general interest, and this page contains a translated digest of our discussion.

There are many points in this discussion which are left open - I welcome comments from any interested person on the discussion list. The text of the first message is printed in italic text. I have numbered the replies to paragraphs in the original message, and have included anchors at each numbered reply. The number of the reply is the value of the NAME attribute. So, if you want to refer to Eric's reply 1 in a message you want to post up to the discussion list, you can use the following tag in your message:In <A HREF="../maths/laloe.htm#1">Eric's reply 1 <A> he claims....

Discussion of issues related to using SGML to encode mathematical formulas

Dear Eric,
Thank you for your message. I'll look at your Web site. Paul Ginsparg (a theoretical physicist at the Los Alamos Laboratory who started the "Los Alamos" physics e-print archives), whom I recently saw in Paris, holds a point of view which is similar to mine, namely that SGML could one day help physicists, while how this can be done is not entirely clear at the moment. He is nevertheless of the opinion that ISO 12083 is not yet a good DTD, in particular for maths, which would explain why no one uses this DTD (which reduces its interest as a standard!).
Eric's reply (1):
As far as the ISO 12083 DTD is concerned, I would be surprised if Paul ever looked at it. And its always the same story - if he thinks this DTD is no good, why have we not had his ideas to improve it?
Franck's reply (1.1):
Humm. For once I am not completely in agreement with you but I rather agree with Paul. We both looked at your web site and discussed it. He thinks that the problem of physicists is not to improve the description of formulas in ISO 10283; but that the problem is of ISO to be accepted by scientists who will not abandon the structure of TeX for formulas. From this point of view (scientific articles) ISO 12083 can not be really used unless there is an automatic translation tool from TeX to ISO 12083 maths. We agree with Paul - and I think you agree - that the market will decide, not the decisions of working and standards groups, no matter how intelligent they are.
Eric's reply (1.1.1):
This is only a problem if you consider SGML as an imposed format. I believe in the context of scientific authors, SGML should not be mentioned. If somewhere SGML can play a role, SGML can be enabling technology. Paul, for example, has a 'database' of TeX articles that could (perhaps) be made richer by using SGML. But someone has to pay the price of the conversion, otherwise the information captured in the articles will remain hidden. To indicate structure in the articles, LaTeX is probably sufficient.
Franck's reply (1.2):
The best chance for ISO 12083, if it really wants to make an impact on the world of scientific articles, would be if it accepts automatically formulas written in TeX, which means that in the DTD the graphical description should be isomorphic with TeX, so that they can be easily translated. That does not mean that ISO 12083 should not go further in parallel, for example to define two types of formulas, the graphical ones isomorphic with TeX, and more abstract symbolic objects in the spirit of formal calculations that will perhaps one day be used.
Apparently the Physical Review Letters include formulas in TeX form in their SGML documents.
Eric's reply (2):
Actually, as far as I remember, the APS do not include formulas in TeX,
Franck's reply (2.1):
Yes, I believe they are encapsulated in TeX form in ISO 12083 SGML, otherwise it would be unusable. But you should check that I understood correctly. In any case this is what many Americans told me.

Eric's reply (3):
but in the form of GIF images because there is no viewer that can display formulas other than in image form.
Franck's reply (3.1):
Be careful, as you know the mixed form SGML/TeX is not used for viewing. Viewing is done with Guidon, with an automatic conversion of the preceding format, or with Acrobat. In any case I do not believe that formulas have been treated as images, except for in the third version, for the Web, which is the least powerful.

Eric's reply (4):
This therefore has nothing to do with the quality of the DTD...there is simply no software that can profit from the SGML format for mathematical formulas. This has an economic reason: the large software companies are not interested in the STM market because it's too small for them.
Franck's reply (4.1):
I also believe that there are authors who would not want a system that is unable to accept formulas in TeX.
Eric's reply (4.1.1) :
I believe there is still a confusion, because the aim of ISO 12083 is not to replace TeX. By the way, ISO 12083 allows the inclusion of TeX alongside the SGML!! And as I have said before, I am not at all convinced by the utility of converting TeX into SGML. Because the only system able to print formulas with a sufficiently high quality is TeX. So if you translate into SGML you then need to do another translation into TeX from SGML! And that is not worthwhile.
Franck's reply (4.2):
The Americans also spoke of an insufficiency in the DTD in that it would not be able to describe indices of indices. And so TeX would be superior, but I do not know the details.
Eric's reply (4.2.1) :
The math DTD in ISO 12083 allows nesting of any object, not only indices.

Eric's reply (5):
I believe there are 4 reasons why it could be worthwhile converting formula's into SGML:
  1. To be idependent of the text formatter. It is unlikely that TeX will still exist 100 years from now, but at the moment there is nothing better, and a system as powerful as TeX will probably contain a filter to import TeX.
    Franck's reply (5.1):
    Why is TeX not independent of the text formatter? I don't see the advantage of SGML over TeX in this case.
    Eric's reply (5.1.1):
    TeX is itself a text formatter and its existence depends on its implementation on a computer. The question to ask is: will TeX still exist in 200 years time? and the same for SGML. The argument for SGML is that it is a neutral format that does not depend on a particular implementation. Even if all computers on the world were destroyed, SGML would still exist. The same can not be said of TeX. In other words, everytime you change computers, or operating system, you have to change TeX to make it work on the new system.
    Franck's reply (5.1.1.1):
    Excuse me if I ask such elementary questions, but I only partially understand those lines. TeX as well as SGML describe a document in terms of sequences of ASCII characters, and so in 200 years we need to be able to read them in both cases. SGML describes the logical structure fo a document, while LaTex gives a mixed description, parly graphical but also a bit logical. To print the document and to use it, typographical rules are required in both cases. And for SGML you need to understand the DTD. Is there really such a conceptual difference?
    Eric's reply (5.1.1.1.1):
    Consider the following example: 15 years ago I wrote my thesis in Waterloo Script (TeX was not yet available). If I would want to reprint this from the original tape, I would need to find a 16" tape unit capable of reading 6250 bpi tapes on an Ebcdic system, regenerate the correct version of Script (there were dozens) with the right fonts (some of which were home-grown, find the correct print-driver and print it on an IBM 3800 laser printer. The problem can be summarized as follows: not only has the software become obsolete, but it was inextricably intertwined with the hardware that was available at the time - and this is ofcourse also obsolete. With SGML I do not have to worry about character sets, or software becoming obsolete. I'll always be able to read and process my data.
  2. Eric (6). To be able to do searches on formulas. But I know of very few physicists that think this is useful.
    Franck's reply (6.1):
    Searching on formulas or their content is a good and very interesting idea. The possibility sometimes creates the need. For example, I sometimes need to write an article and change its notation - I want to systematically be able to change one letter with another, or a group of letters, in all formulas. But I can already do that in Scientific Word, that is based on LaTeX.
    Eric's reply (6.1.1):
    OK, but do you want to search to be able to make changes the visual appearance of a formula, or are you also interested in doing a search on the mathematical content? In any case it seems that for the type of searching you want to do SGML is not required.
    Franck's reply (6.1.1.1):
    For example I would look for all articles containing this or that equation, independent of its typography? I am not sure that I have any examples ready in my head for this case.
  3. Eric (7). To be able to calculate with the formula. This could be useful, but I think real phycisists prefer to verify other people's formula's by hand in most cases. In any case, Roy Pike is trying to do this, and if he succeeds, so much the better.
    Franck's reply (7.1):
    I agree. In any case this possibility also exists in Scientific Word, which is widely used in certain disciplines (apparently not in high energy physics because Paul Ginsparg for example had never heard of it). So this possibility is already present, at least partially.
  4. Eric (8).To be able to distribute the formula's over the Web. But with Java and Netscape plug-ins, we could soon have TeX viewers for the Web.

Eric's reply (9):
So far for maths. In general, I don't think SGML is a format that should be typed in by authors. Either there will be very userfriendly SGML tools such as the multitude of HTML editors that are appearing on the market, or the conversion to SGML will be done somewhere centrally. A number of ISO 12083 tools is in the pipeline, including one from Folio.
Franck's reply (9.1):
If SGML isn't a format that should be encoded by authors, as you think, even through interfaces such as Scientific Word or others, you consider that Paul is right: SGML can be a very good tool in many disciplines, but not for academic physics that wants to have the entire editorial chain under its own control. And so, physicists and the should look elsewhere. But I think you are pessimistic and I believe one can develop tools that will permit us to better structure texts, starting from the moment of their creation. SGML could very well enter progressively in our working habits via LaTex, but in an adiabatic way, depending on our needs. At least a DTD is required that allows automatic conversion to and from SGML.
Franck's reply (9.2)
If I participated in your discussion group on the structure of mathematical formulas, I would start by studying closely what Scientific Workplace does. How it is able to do formal calculations from TeX with Maple, so you understand its advantages and its limits. It's in no one's interest to change for the sake of changing, that only gives rise to academic discussions. To start from scratch in the description of math with the enormous legacy of scientific articles in TeX seems uninteresting to me, people will never follow.
Franck's reply (9.3)
My belief is that SGML will help physicists only if they can *use it themselves*; now that they have been freed from intermediaries for publishing their articles, they will love if they can go further in this direction, but hate it if new barriers are created. On this point I am more optimistic than you, I think that the process is possible by an adiabatic transition for Tex/Latex. In other words I am advocating either
  1. A convergent motion: a latex style could be modified for the explicit purpose to reduce the gap to a DTD, and conversely a DTD explicitely built fo allowing easy automatic translation from/to LaTeX files.
  2. Alternatively, author oriented software in the style of Grif or others should be developed/improved, with a Latex translator, so that authors produce SGML "without knowing it" (in the same way as Scientific words allows authors to create Tex without knowing it).
In other words, I personally believe in PG's philosophy that more can be transferred to the authors provided they have good tools and provided they benefit from it in terms on independence.
If you have any information on the evolution of ISO 12083 in the near future this obviously is of great interest to me. With kind regards, Franck