Maria Heijne, SURFnet bv
(Maria.Heijne@SURFnet.nl)
Grootste deel van de deelnemers afkomstig uit overheid/bedrijven. Academische wereld nauwelijks vertegenwoordigd. Vanuit Nederland waren het de uitgevers en vertegenwoordigers van de inustrie (Fokker,Philips, Ericsson).
Aangezien mijn ervaring met produkten nog gering is ga ik hier af op degenen die meer expertise hebben. Men zei dat in het algemeen er weinig nieuws aan het produktenfront te melden was.
De highlights:
Een aantal van deze produkten zullen in het kader van Premium nader worden bekeken.
Enkele druk bediscussieerde items:
Aangezien voor het verslag ook geput is uit de Proceedings, is dat in het Engels opgesteld. Wilt u onderdelen uit de Proceedings ontvangen die in dit verslag niet aan de orde zijn gekomen, dan kunt u die aanvragen bij de opsteller van dit stuk (Maria.Heijne@SURFnet.nl).
Conference proceedings are available on paper and electronically. The report hereunder refers to most of the * labeled sessions.
Growing attention for SGML:
new products are appearing fast and they force the older products to get better. Interestingly, Microsoft and WordPerfect, demanded by the market, announce SGML software and for the first time in their lives they are selling software that makes it easy for you to switch to someone else's software.
Suppliers have to compete on capability since they cannot count on proprietary formats. This capability has a different meaning to each of them:
WWW as the World's Largest SGML Application When you already use SGML you can immediately publish on the Web. By the end of 1995 there will be several full-blown SGML browsers available. The first is Softquad Panorama (free). Many people use SGML without realizing it. HTML is very simple but on the other hand also complex. But it gives the idea of simplicity and that's why people like it. Then it also adds drama which gives people digital gratification. Typical SGML applications dont.
HTML stands for:
SGML top 6 characteristics are:
SGML standard revision:
DSSSL
HyTime
Goldfarb uses the phrase compound document architecture wars with OpenDoc and OLE.
Most important phrases:
The following papers were presented
Establishing clear requirements for SGML projects - this did not cover any new requirements that people usually already establish when setting up an IT project.
SGML allows you to describe document content and to share and reuse that content throughout your organization. Processes related to your documents should be (made)visible. Distinction between document management and workflow management (define work processes, capture information about processes). Document management system focuses on the management of individual objects (documents, data) within the system. Workflow Management system focuses on monitoring all the objects within the system and on process management. BPR is the activity of streamlining major work processes.
The amount of time and the amount and type of resources involved
in document analysis and DTD development have made SGML
difficult to cost-justify for documents that do not have a long
life nor many potential uses. When an organization considers
SGML they come to answer fundamental questions about the
business processes.
When an organization develops an SGML application, it has
essentially developed its document management architecture and
defined its work processes.
The audience was not very impressed with the necessity of BPR in organizing a document management system.
Advantages SGML
Shell uses: Arbortext, Mark-it, Write-it, DynaText
SGML for documentation that is to provide reliable information for a long period of time and where independency of the proprietary format is the best choice. Embarking on SGML is not a cheap solution, it requires a significant investment up front: IT components but also conversion of legacy data. Justification is increases quality of the information, but this is seldom easy to quantify. Business justification found in the different way the information is managed.
There is also documentation where the lifetime is shorter and the information is of a more transient nature. Paulas answer is not to force SGML into this area but to try and bring some structure in. Maybe HTML will take us to exploit SGML in a bigger way than we currently think. It can play a major role in changing attitudes. With Internet publication becoming more and more the norm rather than the exception, authors will have to start using HTML and once that is accepted a move to full- fledged use of SGML targeted at managing the information, may not be as far off as we sometimes think.
The following papers were presented
OECD uses SGML but converts all publications (which preferably are authored in WORD). They developed add-ons to WORD. Only this way (small controlled department application) provides a manageable environment for the introduction of SGML. This approach produces measurable benefits in productivity, quality and re-use of information.
In this case a few people started to concentrate on SGML as a way of producing documentation. They found that SGML was profitable because:
They also came upon new possibilities for use of the documentation:
Lessons learned:
Use of SGML in European Commission
In the environment of the EC it is obvious that exchange and re-
use of material is necessary. SGML could be a good solution
provided that the authors deliver validated SGML documents.
On the basis of the large installed base of Word Processing
packages, it seems the best way to use SGML add-on to existing
word processors.
A few remarks in order to make the introduction of structured editing more acceptable:
Adding SGML functionality to a word processor has to be experienced as an incremental benefit of structuring capability, instead of narrowing its functionality.
Implementing an SGML functional add-on to an existing word
processor presents a number of choices. A WP application
produces formatted pages. Functionally, styles are no more than
a set of formatting codes which are applied (paragraphs,
numbering schemes, footnotes, graphic elements, tables, lists
etc).
SGML functionality added to a word processor must find a way of
introducing much more complex logical structures into a
document.
Guiding
In order to produce valid SGML document instances, the author
has to be advised or to be forced to select between valid
constructs: styles, formatting codes or tag codes. This is
dictated by the DTD logic.
the DTD logic can be tightly coupled with the WP environment.
Through the correspondence of wp constructs and logical elements
the DTD logic guides the structuring. In each context the valid
choices are proposed or enforced.
Some tools provide loose coupling: the document's structure is validated only on request. Such tools need a well-designed authoring guide. They also require that style and presentation requirements are strictly followed.
The DTD should serve as a data model which guides the customization development. The vendors should follow these considerations. Some products now offer more interesting possibilities than others for coupling customization to the DTD. They do not require that a complete customization is ready before it can be tested and allow for modifications without the need for redesigning the complete set-up.
SGML tools are inadequate and only emulate inadequate stuff.
Tools should provide SGML benefits in all phases of the
processes.
People then become aware that they are authoring data and they
do not have to know much about SGML syntax. Instead they only
have to know about their documentation system and the
information that has to be derived from the system. That SGML
enables this is not important to them.
Most important is: the most efficient way to capture the sufficient information level. Frontier between editors (Author Editor, Adept, Grif, InContext) and conversion tools (DynaText) seems to blur. The newest products: MS SGML Author is a real-time conversion tool and Tag Wizard is a SGML editor based upon Word. But the decision between the use of native SGML data entry or post conversion (conversion from structured WP to SGML and automated SGML document generation from a structured database) is still difficult: what are the selection criteria to decide which way to follow, for what sort of DTD, which kind of user, what about tables and math equations?
Current status of SGML:
Documents are collections of data fragments: the monolithic
document concept disappears completely. Fragments are assembled
today in one document and tomorrow in another.
Integration of different tools to manipulate and produce
fragments.
Data is generated in the document and sent from tool to tool.
The behavior of the fragments is very specific to the data being
manipulated and to the tools that manipulate them.
The user interface should be data oriented and not fixed throughout a complete document. Data exchange protocols should be able to interchange fragments between tools in a transparent way. This is called Document Oriented User Interface (DOI): basic user operations on the document launches tools which operate distinctively on the selected portion of the document.
OLE (MS) and Open Doc (Apple,IBM,Novell a.o.) are aiming at this user interface philosophy. Both are aiming at providing compound document and DOI support.
When building a document you create or get frames and put them into a document window. When you select an element, the appropriate tool to manipulate it becomes available. You see the document as a whole in the same window, but a software mechanism will distinguish mouse clicks and menus for each element. The system chooses the treatment to apply, launches the right job and helps. Present solid protocols for integrating different tools by exchanging data fragments in a seamless way.
SGML enforces DOI by providing a tool independent storage format for data fragments. These semantically marked fragments (user objects as opposed to software vendor objects in OLE and OpenDoc) could present a more flexible user-interface than non- SGML data and migrate more easily from tool to tool because their content model is more clearly defined.
The following papers were presented
Most important to realize that SGML projects are about people: the new processes have to be carried out by people. Who is implementing the project, who will be designing the new processes, who will be using them and who will output be delivered to? What do people need and conflict needs with one another.
When starting (1990) the plans for electronic publishing were:
Implementation of small-scale computer-aided production system for a.o. building blocks and journals with the following activities:
Complicating factors:
Conclusion: analyzing all publications should have been done
first - not just articles. Therefore several versions of DTDs
were made.
On the other hand when you do this then your DTD may become too
loose. Alternative may be that you do a broad document analysis
and specify this for 1 kind of documents when at the same time
you keep an eye on the broader spectrum.
Other papers in session not attended.
The following papers were presented
Some suggest doing away with HTML. SGML should be the basis for the Web, in order to fit anyone's need in their own DTD and not to try and force it all down to the simple structure of HTML. HTML should remain the primary data format that forms the backbone of the Web:
HTML-centric Web but one where HTML and SGML coexists and are
used intelligently together. A scalable architecture where both
simple and complex information can be handled effectively.
HTML will provide a common set of semantics (classes of objects)
that can be understood by any web browser, search engine or
whatever application.
Documents can be encoded in SGML but each SGML element in the
DTD would be mapped to one of the standard HTML classes.
HTML has the potential to be a really smart use of SGML, taking
full advantage of SGMLs flexibility without giving up a common
backbone structure that any browser can readily interpret.
Conversion from SGML documents to HTML is not a good idea. HTML is designed as a simple and effective language to represent linkable texts on a variety of computer screens. HTML should be considered as an interface to SGML documents: the information in the SGML is extracted and displayed appropriately without necessarily converting the complete document to HTML. Mapping should take place in an ad-hoc way: have a set of forms and scripts for each DTD like for ISO 12083. The realization that HTML is for hypertext representation and not for capturing semantics should pave the way for many interesting forms interfaces.
Generation of HTML from SGML can be automated and make publishing on the Web instantaneous and cost effective. This process of generating HTML is down-translation in which rich generalized markup is filtered down to specific markup. With SGML to HTML transformation the following aspects need attention:
Several opinions:
According to someone from Novell, who is member of IETF/HTML working Group representing SGML Open, HTML 3.0 is just a word. The WG is still formulating HTML 2.0.
PDF is seen by many attendants as not of any importance, because the page lay-out is too limited for anything else than delivery of material.
ISO 12083/1994 is a DTD for Books, Serials, Articles, Math.
In particular the solution for math is regarded as very important: the writers of ISO 12083 combined several ideas (after much discussion with the inventors of other ideas) like the solution of the American Ass. of Publishers, ISO 9573 (as followed by EBT and MS) and Euromath (as followed by Grif).
They treat math as separate structures - you can not do any calculations on it. The recommendation is to do typesetting in TeX.
EPSIG wants to gather users of ISO 12038 / 1994, exchange experiences and propose changes to the standard.
EPSIG wants to do a project aiming at providing a Web interface to 12083 documents database. The idea is to have a user query set up through a form and to compose the answer in a intelligent way by putting together document parts from the 12083 document database. Not by sending the whole document like the Web does now which means that the user has to print the document to be able to read it.
This is exactly an idea (and a DTD) to be worked out by libraries as well.